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Abstract — The design of digital filters is a fundamental process in 
the context of digital signal processing. The purpose of this paper is to 
study the use of l p norms (for 2 < p < oo) as design criteria for digital 
filters, and to introduce a set of algorithms for the design of Finite (FIR) 
and Infinite (IIR) Impulse Response digital filters based on the Iterative 
Reweighted Least Squares (IRLS) algorithm. The proposed algorithms rely 
on the idea of breaking the l p filter design problem into a sequence of 
approximations rather than solving the original l p problem directly. It is 
shown that one can efficiently design filters that arbitrarily approximate 
a desired l p solution (for 2 <p < oo) including the commonly used (or 
minimax) design problem. A method to design filters with different norms 
in different bands is presented (allowing the user for better control of the 
signal and noise behavior per band). Among the main contributions of this 
work is a method for the design of magnitude l p IIR filters. Experimental 
results show that the algorithms in this work are robust and efficient, 
improving over traditional off-the-shelf optimization tools. The group of 
proposed algorithms form a flexible collection that offers robustness and 
efficiency for a wide variety of digital filter design applications. 

I. Introduction 

The design of digital filters has fundamental importance in digital 
signal processing. One can find applications of digital filters in many 
diverse areas of science and engineering including medical imaging, 
audio and video processing, oil exploration, and highly sophisticated 
military applications. Furthermore, each of these applications benefits 
from digital filters in particular ways, thus requiring different proper- 
ties from the filters they employ. Therefore it is of critical importance 
to have efficient design methods that can shape filters according to 
the user's needs. 

In this work we use the discrete l p norm as the criterion for 
designing efficient digital filters. We also introduce a set of algo- 
rithms, all based on the Iterative Reweighted Least Squares (IRLS) 
method, to solve a variety of relevant digital filter design problems. 
The proposed family of algorithms has proven to be efficient in 
practice; these algorithms share theoretical justification for their use 
and implementation. Finally, the document makes a point about the 
relevance of the l p norm as a useful tool in filter design applications. 

The rest of this chapter is devoted to motivating the problem. 
Section |I-A| introduces the general filter design problem and some 
of the signal processing concepts relevant to this work. Section [FC| 
presents the basic Iterative Reweighted Least Squares method, one 
of the key concepts in this document. Section |T-D| introduces Finite 
Impulse Response (FIR) filters and covers theoretical motivations 
for l p design, including previous knowledge in l p optimization (both 
from experiences in filter design as well as other fields of science 
and engineering). Similarly, Section |I-E| introduces Infinite Impulse 
Response (IIR) filters. These last two sections lay down the structure 
of the proposed algorithms, and provide an outline for the main 
contributions of this work. 

Chapters [n] and [Til] formally introduce the different l p filter 
design problems considered in this work and discuss their IRLS- 
based algorithms and corresponding results. Each of these chapters 
provides a literary review of related previous work as well as a 
discussion on the proposed methods and their corresponding results. 



An important contribution of this work is the extension of known and 
well understood concepts in l v FIR filter design to the IIR case. 

A. Digital filter design 

When designing digital filters for signal processing applications 
one is often interested in creating objects h £ H N in order to alter 
some of the properties of a given vector x £ R M (where < M, N < 
oo). Often the properties of x that we are interested in changing lie in 
the frequency domain, with X = J-(x) being the frequency domain 
representation of x given by 

x & X = Axe?"** 

where Ax and <p x are the amplitude and phase components of x, 
and T(-) : 1H N h-> R°° is the Fourier transform operator defined by 

N-l 

T{h} = H(u) = h n e- ]u,n Mue [-n, vr] (1) 

n = 

So the idea in filter design is to create filters h such that the 
Fourier transform H of h posesses desirable amplitude and phase 
characteristics. 

The. filtering operator is the convolution operator (*) defined by 
(x * h)(n) = x(m)h(n — m) 

m 

An important property of the convolution operator is the Convolution 
Theorem [1] which states that 

x*h& X ■ H = {A X ■ Ab)^* x+ * b) (2) 

where {Ax, 4>x} anc ^ {Ah , 4>h} represent the amplitude and phase 
components of X and H respectively. It can be seen that by filtering 
x with h one can apply a scaling operator to the amplitude of x and 
a biasing operator to its phase. 

A common use of digital filters is to remove a certain band of 
frequencies from the frequency spectra of x (such as typical lowpass 
filters). Other types of filters include band-pass, high-pass or band- 
reject filters, depending on the range of frequencies that they alter. 

B. The notion of approximation in l p filter design 

Once a filter design concept has been selected, the design problem 
becomes finding the optimal vector h £ K n that most closely 
approximates our desired frequency response concept (we will denote 
such optimal vector by h*). This approximation problem will heavily 
depend on the measure by which we evaluate all vectors h £ K" to 
choose h* . 

In this document we consider the discrete l p norms defined by 

l|a||j.= W^M P V a £ R w (3) 

as measures of optimality, and consider a number of filter design 
problems based upon this criterion. The work explores the Iterative 
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Reweighted Least Squares (IRLS) approach as a design tool, and 
provides a number of algorithms based on this method. Finally, this 
work considers critical theoretical aspects and evaluates the numerical 
properties of the proposed algorithms in comparison to existing 
general purpose methods commonly used. It is the belief of the author 
(as well as the author's advisor) that the IRLS approach offers a more 
tailored route to the l v filter design problems considered, and that it 
contributes an example of a made-for-purpose algorithm best suited 
to the characteristics of l p filter design. 

C. The IRLS algorithm 

Iterative Reweighted Least Squares (IRLS) algorithms define a 
family of iterative methods that solve an otherwise complicated 
numerical optimization problem by breaking it into a series of 
weighted least squares (WLS) problems, each one easier in principle 
than the original problem. At iteration i one must solve a weighted 
least squares problem of the form 

min ||w(ft 4 _i)/(fti)l|2 (4) 

hi 

where w(-) is a specific weighting function and /(■) is a function 
of the filter. Obviously a large class of problems could be written in 
this form (large in the sense that both w(-) and /(•) can be defined 
arbitrarily). One case worth considering is the linear approximation 
problem defined by 

min \\D - Ch\\ (5) 

h 

where D e R M and C G R MxN are given, and ||<|| is an arbitrary 
measure. One could write /(•) in as 

f(h) = D-Ch 

and attempt to find a suitable function w(-) to minimize the arbitrary 
norm ||-|| in |5j. In vector notation, at iteration i one can write (|4j 
as follows, 

min||to(fti_i)(£)-CAii)||2 (6) 

hi 

One can show that the solution of {6]l for any iteration is given by 

h = (C T WC) _1 C T W£> 

with W = diag(iu 2 ) (where w is the weighting vector). To solve 
problem ([6| above, one could use the following algorithm: 

1) Set initial weights wo 

2) At the z-th iteration find h % = (C T W l _iC)" 1 C T W i _i£> 

3) Update Wj as a function of hi (i.e. Wj = ~W(hi) ) 

4) Iterate steps 2 and 3 until a certain stopping criterion is reached 
This method will be referred in this work as the basic IRLS algorithm. 

An IRLS algorithm is said to converge if the algorithm produces 
a sequence of points hi such that 

Urn hi — h* 

i — >oo 

where h* is a fixed point defined by 

h* = (c T wcy 1 c T WD 

with W* = W(/i*). In principle one would want h* = h* (as 
defined in Section |I-B| >. 

IRLS algorithms have been used in different areas of science and 
engineering. Their atractiveness stem from the idea of simplifying a 
difficult problem as a sequence of weighted least squares problems 
that can be solved efficiently with programs such as Matlab or LA- 
PACK. However (as it was mentioned above) success is determined 
by the existence of a weighting function that leads to a fixed point 
that happens to be at least a local solution of the problem in question. 



This might not be the case for any given problem. In the case of l p 
optimization one can justify the use of IRLS methods by means of 
the following theorem: 

Theorem 1 (Weight Function Existence theorem): Let gfc(tj) be a 
Chebyshev set and define 

M 

H(h;w) = ^ h k g k (w) 

where h = (ho, hi, . . . , Iim) T ■ Then, given D(ui) continuous on 
[0, 7t] and l<g<p<oo the following are identical sets: 

• {h | H(h;uj) is a best weighted L p approximation toD(u;) on 

[0,*]}. 

• {h | H(h;ui) is a best weighted L q approximation to D(ui) on 
[0,*]}. 

Furthermore, the theorem above is valid if the interval [0, 7r] is 
replaced by a finite point set SI C [0, n] (this theorem is accredited 
to Motzkin and Walsh (2), (3)). 

Theorem [T] is fundamental since it establishes that weights exist 
so that the solution of an L p problem is indeed the solution of a 
weighted L q problem (for arbitrary p,q> 1). Furthermore the results 
of Theorem [Tj remain valid for l p and l q . For our purposes, this 
theorem establishes the existence of a weighting function so that the 
solution of a weighted h problem is indeed the solution of an l p 
problem; the challenge then is to find the corresponding weighting 
function. The remainder of this document explores this task for a 
number of relevant filter design problems and provides a consistent 
computational framework. 

D. Finite Impulse Response (FIR) l„ design 

A Finite Impulse Response (FIR) filter is an ordered vector h £ 
R" (where < N < oo), with a complex polynomial form in the 
frequency domain given by 

JV-l 

H(uj) = ]T h n e~ J " n 

n = 

The filter H(uj) contains amplitude and phase components 
{Ah(oj), </>ir(w)} that can be designed to suit the user's purpose. 

Given a desired frequency response D(uj), the general l p approx- 
imation problem is given by 

min H-D(uj) — H(h\u)\\ p 

h 

In the most basic scenario D(uu) would be a complex valued function, 
and the optimization algorithm would minimize the l p norm of the 
complex error function e(uj) = D(uj) — H(to); we refer to this case 
as the complex l p design problem (refer to Section |Il-C[ >. 

One of the caveats of solving complex approximation problems is 
that the user must provide desired magnitude and phase specifications. 
In many applications one is interested in removing or altering a range 
of frequencies from a signal; in such instances it might be more 
convenient to only provide the algorithm with a desired magnitude 
function while allowing the algorithm to find a phase that corresponds 
to the optimal magnitude design. The magnitude l p design problem 
is given by 

min \\D(u)) - \H(h;u>)\ \\ p 

h 

where D(u>) is a real, positive function. This problem is discussed 
in Section Hl-DI 

Another problem that uses no phase information is the linear phase 
l p problem. It will be shown in Section |Tl-B| fhat this problem can be 
formulated so that only real functions are involved in the optimization 
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problem (since the phase component of H(tj) has a specific linear 
form). 

An interesting case results from the idea of combining different 
norms in different frequency bands of a desired function D(lj). 
One could assign different p- values for different bands (for example, 
minimizing the error energy (£2) in the passband while using a 
minimax error (£00) approach in the stopband to keep control of 
noise). The frequency-varying l p problem is formulated as follows, 



mm 

h 



\(D - H)(u pb )\\ p + \\(D - H)(wsb)\\, 



where {ui p b,ui p b\ are the passband and stopband frequency ranges 
respectively (and 2<p, q<oo). 

Perhaps the most relevant problem addressed in this work is the 
Constrained Least Squares (CLS) problem. In a continuous sense, a 
CLS problem is defined by 



||d(w)-fr(w)|| a 
subject to \d(uj) — H(uj)\ <r 



mm 

h 



The idea is to minimize the error energy across all frequencies, but 
ensuring first that the error at each frequency does not exceed a 
given tolerance r. Section |II-F| explains the details for this problem 
and shows that this type of formulation makes good sense in filter 
design and can efficiently be solved via IRLS methods. 

1) The IRLS algorithm and FIR literature review: A common 
approach to dealing with highly structured approximation problems 
consists in breaking a complex problem into a series of simpler, 
smaller problems. Often, one can even prove important mathemat- 
ical properties in this way. Consider the l p approximation problem 
introduced in (3}, 

min \\f(h)\\ p (7) 

h 

For simplicity at this point we can assume that /(•) : M. N t-¥ R M is 
linear. It is relevant to mention that (J7]l is equivalent to 



min ||/(h)||J 



(8) 



In its most basic form the l p IRLS algorithm works by rewriting ([8} 
into a weighted least squares problem of the form 



min \\w(h)f(h)\\ 2 

h 



(9) 



Since a linear weighted least squares problem like l|9j has a closed 
form solution, it can be solved in one step. Then the solution is used 
to update the weighting function, which is kept constant for the next 
closed form solution and so on (as discussed in Section p-C) . 

One of the earlier works on the use of IRLS methods for l p approx- 
imation was written by Charles Lawson |4|-[6|, in part motivated by 
problems that might not have a suitable loo algorithm. He looked 
at a basic form of the IRLS method to solve loo problems and 
extended it by proposing a multiplicative update of the weighting 
coefficients at each iteration (that is, v)k+i(t*)) = /(w) • w k (io)'). 
Lawson's method triggered a number of papers and ideas; however 
his method is sensitive to the weights becoming numerically zero; in 
this case the algorithm must restart. A number of ideas |5|, |6| have 
been proposed (some from Lawson himself) to prevent or deal with 
these occurrences, and in general his method is considered somewhat 
slow. 

John Rice and Karl Usow (5), (7) extended Lawson's method to 
the general l p problem (2 <p < 00) by developing an algorithm based 
on Lawson's that also updates the weights in a multiplicative form. 
They used the results from Theorem[TJby Motzkin and Walsh |2|, |3| 
to guarantee that a solution indeed exists for the l p problem. They 
defined 

Wh+i(u) = wt{uj)\t k (w)f 



where 



and 



7(P~2) 
7(p-2)+l 



/? = - 
2 7 



2( 7 (p - 2) + 1) 

with 7 being a convergence parameter and e(cj) = d(u)) — H(uj). The 
rest of the algorithm works the same way as the basic IRLS method; 
however the proper selection of 7 could allow for strong convergence 
(note that for 7 = we obtain the basic IRLS algorithm). 

Another approach to solve {7} consists in a partial updating 
strategy of the filter coefficients rather than the weights, by using 
a temporary coefficient vector defined by 



a k+1 = [C T Wl , W fc C]- 1 C T W^W fc A d 



(10) 



The filter coefficients after each iteration are then calculated by 

a k+1 = \a k+1 + (1 - A)a fc (1 1) 

where A is a convergence parameter (with < A < 1). This approach 
is known as the Karlovitz method [8], and it has been claimed that 
it converges to the global optimal solution for even values of p such 
that 4<p<oo. However, in practice several convergence problems 
have been found even under such assumptions. One drawback is that 
the convergence parameter A has to be optimized at each iteration 
via an expensive line search process. Therefore the overall execution 
time becomes rather large. 

S. W. Kahng |9| developed an algorithm based on Newton- 
Raphson's method that uses 

1 



A 



1 



to get 



a.fc+i 



a k +i + (p — 2)a k 



(12) 



(13) 



p-1 

This selection for A is based upon Newton's method to minimize e 
(the same result was derived independently by Fletcher, Grant and 
Hebden |10|). The rest of the algorithm follows Karlovitz's approach; 
however since A is fixed there is no need to perform the linear 
search for its best value. Since Kahng's method is based on Newton's 
method, it converges quadratically to the optimal solution. Kahng 
proved that his method converges for all cases of A and for any 
problem (at least in theory). It can be seen that Kahng's method is 
a particular case of Karlovitz's algorithm, with A as defined in \\2) . 
Newton-Raphson based algorithms are not warranted to converge to 
the optimal solution unless they are somewhat close to the solution 
since they require to know and invert the Hessian matrix of the 
objective function (which must be positive definite |11|). However, 
their associated quadratic convergence makes them an appealing 
option. 

Burrus, Barreto and Selesnick developed a method [7 J, |12], |13| 
that combines the powerful quadratic convergence of Newton's meth- 
ods with the robust initial convergence of the basic IRLS method, thus 
overcoming the initial sensitivity of Newton-based algorithms and 
the slow linear convergence of Lawson-based methods. To accelerate 
initial convergence, their approach to solve jTJ uses p = o * 2, where 
a is a convergence parameter (with 1 < o < 2). At any given iteration, 
p increases its value by a factor of a. This is done at each iteration, 
so to satisfy 

p k = min (p des ,a ■ p k -i) (14) 

where pd es corresponds to the desired l p norm. The implementation 
of each iteration follows Karlovitz's method using the particular 
selection of p given by l |14[ (. 
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which can be posed as a weighted least squares problem of the form 



Fig. 1. Homotopy approach for IRLS l p filter design. 



It is worth noting that the method outlined above combines several 
ideas into a powerful approach. By not solving the desired l v problem 
from the first iteration, one avoids the potential issues of Newton- 
based methods where convergence is guaranteed within a radius of 
convergence. It is well known that for 2 < p < oo there exists 
a continuum of l v solutions (as shown in Figure [TJ. By slowly 
increasing p from iteration to iteration one hopes to follow the 
continuum of solutions from I2 towards the desired p. By choosing a 
reasonable a the method can only spend one iteration at any given p 
and still remain close enough to the optimal path. Once the algorithm 
reaches a neighborhood of the desired p, it can be allowed to iterate 
at such p, in order to converge to the optimal solution. This process 
is analogous to homotopy, a commonly used family of optimization 
methods |14| . 

While I2 and loo designs offer meaningful approaches to filter 
design, the Constrained Least Squares (CLS) problem offers an 
interesting tradeoff to both approaches |15| . In the context of filter 
design, the CLS problem seems to be first presented by lohn Adams 
1 16| in 1991. The problem Adams posed is a Quadratic Programming 
(QP) problem, well suited for off-the-shelf QP tools like those based 
on Lagrange multiplier theory [16|. However, Adams posed the 
problem in such a way that a transition band is required. Burrus et al. 
presented a formulation |17|-|19| where only a transition frequency 
is required; the transition band is induced; it does indeed exist but 
is not specified (it adjusts itself optimally according to the constraint 
specifications). The method by Burrus et al. is based on Lagrange 
multipliers and the Karush-Kuhn-Tucker (KKT) conditions. 

An alternative to the KKT-based method mentioned above is the 
use of IRLS methods where a suitable weighting function serves as 
the constraining function over frequencies that exceed the constraint 
tolerance. Otherwise no weights are used, effectively forcing a least- 
squares solution. While this idea has been suggested by Burrus et al., 
one of the main contributions of this work is a thorough investigation 
of this approach, as well as proper documentation of numerical 
results, theoretical findings and proper code. 



E. Infinite Impulse Response (IIR) l p design 

In contrast to FIR filters, an Infinite Impulse Response (IIR) filter 



is defined by two ordered vectors a £ R and b G 
< M, N < 00), with frequency response given by 

B(u) 



lM+1 



(where 



M 

E Ke~ 

n=0 



A(u) 



1 + E a n e-i un 



Hence the general l p approximation problem is 

- - D(w) 



mm 

a n ,b n 



M 

E b n e~ 

n=0 



1+ E a„e-i™ 

71 = 1 



(15) 



mm 



w(co) 



E bn 



\ 



1+ E a n e-^ n 

71 = 1 



D(oj) 



J 



(16) 



It is possible to design similar problems to the ones outlined in 
Section |I-D| for FIR filters. However, it is worth keeping in mind 
the additon al complication s that IIR design involves, including the 
nonlinear least squares problem presented in Section [i-E 1 1 below. 

1) Least squares IIR literature review: The weighted nonlinear 
formulation presented in suggests the possibility of taking ad- 
vantage of the flexibilities in design from the FIR problems. However 
this point comes at the expense of having to solve at each iteration a 
weighted nonlinear I2 problem. Solving least squares approximations 
with rational functions is a nontrivial problem that has been studied 
extensively in diverse areas including statistics, applied mathematics 
and electrical engineering. One of the contributions of this document 
is a presentation in Section [IlI-B| on the subject of I2 IIR filter design 
that captures and organizes previous relevant work. It also sets the 
framework for the proposed methods used in this document. 

In the context of IIR digital filters there are three main groups 
of approaches to |T6j. Section |III-B1| presents relevant work in 
the form of traditional optimization techniques. These are methods 
derived mainly from the applied mathematics community and are in 
general efficient and well understood. However the generality of such 
methods occasionally comes at the expense of being inefficient for 
some particular problems. Among the methods found in literature, the 
Davidon-Flecther-Powell (DFP) algorithm [20|, the damped Gauss- 
Newton method (21) , (22), the Levenberg-Marquardt algorithm [23), 
1 24 1, and the method of Kumaresan |25J, |26| form the basis of a 
number of methods to solve (JT3J . 

A different approach to l |15[ l from traditional optimization methods 
consists in linearizing \16\ by transforming the problem into a sim- 
pler, linear form. While in principle this proposition seems inadequate 
(as the original problem is being transformed), Section [TlI-B2| presents 
some logical attemps at linearizing {16} and how they connect with 
the original problem. The concept of equation error (a weighted form 
of the solution error that one is actually interested in solving) has 
been introduced and employed by a number of authors. In the context 
of filter design, E. Levy [27] presented an equation error linearization 
formulation in 1959 applied to analog filters. An alternative equation 
error approach presented by C. S. Burrus |28| in 1987 is based on 
the methods by Prony [29| and Pade [30|. The method by Burrus can 
be applied to frequency domain digital filter design, and is used in 
selected stages in some of the algorithms presented in this work. 

An extension of the equation error methods is the group of iterative 
prefiltering algorithms presented in Section |III-B8| These methods 
build on equation error methods by weighting (or prefiltering) their 
equation error formulation iteratively, with the intention to converge 
to the minimum of the solution error. Sanathanan and Koerner |31| 
presented in 1963 an algorithm (SK) that builds on an extension 
of Levy's method by iterating on Levy's formulation. Sid-Ahmed, 
Chottera and Jullien [32] presented in 1978 a similar algorithm to 
the SK method but applied to the digital filter problem. 

A popular and well understood method is the one by Steiglitz and 
McBride (33), (34) introduced in 1966. The SMB method is time- 
domain based, and has been extended to a number of applications, 
including the frequency domain filter design problem |35|. Steiglitz 
and McBride used a two-phase method based on linearization. 
Initially (in Mode-1) their algorithm is essentially that of Sanathanan 
and Koerner but in time. This approach often diverges when close to 
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the solution; therefore their method can optionally switch to Mode-2, 
where a more traditional derivative-based approach is used. 

A more recent linearization algorithm was presented by L. Jackson 
1 36 1 in 2008. His approach is an iterative prefiltering method based 
directly in frequency domain, and uses diagonalization of certain 
matrices for efficiency. 

While intuitive and relatively efficient, most linearization methods 
share a common problem: they often diverge close to the solution 
(this effect has been noted by a number of authors; a thorough review 
is presented in |35 |). Section |III-B 1 3| presents the quasilinearization 
method derived by A. Soewito |35| in 1990. This algorithm is robust, 
efficient and well-tailored for the least squares IIR problem, and is 
the method of choice for this work. 

II. Finite Impulse Response Filters 

This chapter discusses the problem of designing Finite Impulse 
Response (FIR) digital filters according to the l v error criterion using 
Iterative Reweighted Least Squares methods. Section |II-A| gives an 
introduction to FIR filter design, including an overview of traditional 
FIR design methods. For the purposes of this work we are particularly 
interested in h and l^ design methods, and their relation to relevant 
l p design problems. Section [Tl-B | formally introduces the linear phase 
problem and presents results that are common to most of the problems 
considered in this work. Finally, Sections |II-C| through |II-E| present 
the application of the Iterative Reweighted Least Squares algorithm 
to other important problems in FIR digital filter design, including the 
relevant contributions of this work. 

A. Traditional design of FIR filters 

Section [LA| introduced the notion of digital filters and filter design. 
In a general sense, an FIR filter design problem has the form 



mm 

h 



l/WII 



where /(•) defines an error function that depends on h, and |]-|| is 
an abitrary norm. While one could come up with a number of error 
formulations for digital filters, this chapter elaborates on the most 
commonly used, namely the linear phase and complex problems (both 
satisfy the linear form f(h) = D — Ch as will be shown later in this 
chapter). As far as norms, typically the h and l x norms are used. 
One of the contributions of this work is to demonstrate the usefulness 
of the more general l p norms and their feasibility by using efficient 
IRLS-based algorithms. 

1) Traditional design of least squares (h) FIR filters: Typically, 
FIR filters are designed by discretizing a desired frequency response 
Hd{oj) by taking L frequency samples at {ujo, wi> • ■ • , wi-i}. One 
could simply take the inverse Fourier transform of these samples 
and obtain L filter coefficients; this approach is known as the 
Frequency Sampling design method (28), which basically interpolates 
the frequency spectrum over the samples. However, it is often more 
desirable to take a large number of samples to design a small filter 
(large in the sense that L 3> N, where L is the number of frequency 
samples and N is the filter order). The weighted least-squares (I2) 
norm (which considers the error energy) is defined by 



£2 



e(w)|| 2 = 



W(u)\D(u) -H(uj)\ 2 duj 



(17) 



where D(ui) and H(uj) = J-(h) are the desired and designed 
amplitude responses respectively. By acknowledging the convexity 



of 
(17 



17b, one can drop the root term; therefore a discretized form of 
is given by 



£ 2 = ^w( W)1 )|flK)-%)f 



(18) 



The solution of Equation ( [18} is given by 

h = ( C T W T WCV 1 C'W'WD 



(19) 



where W = diag(y /; u;) contains the weighting vector w. By solving 
( |19[ l one obtains an optimal I2 approximation to the desired frequency 
response D(uj). Further discussion and other variations on least 
squares FIR design can be found in (28). 

2) Traditional design of minimax (loo ) FIR filters: In contrast to I2 
design, an l^ filter minimizes the maximum error across the designed 
filter's frequency response. A formal formulation of the problem (37), 
(38) is given by 



minmax \D(to) — H(uj;h)\ 

h uj 



(20) 



(21) 



A discrete version of \20\ is given by 

minmax \D(ujk) — Ckh\ 

h k 

Within the scope of filter design, the most commonly approach to 
solving \2\ \ is the use of the Alternation Theorem (39|, in the context 
of linear phase filters (to be discussed in Section |II-B[ >. In a nutshell 
the alternation theorem states that for a length- FIR linear phase 
filter there are at least + 1 extrema points (or frequencies). The 
Remez exchange algorithm (28), (37), (38) aims at finding these 
extrema frequencies iteratively, and is the most commonly used 
method for the minimax linear phase FIR design problem. Other 
approaches use more standard linear programming methods including 
the Simplex algorithm (40), |41| or interior point methods such as 
Karmarkar's algorithm |42|. 

The Zoo problem is fundamental in filter design. While this docu- 
ment is not aimed covering the 1^ problem in depth, portions of this 
work are devoted to the use of IRLS methods for standard problems 
as well as some innovative uses of minimax optimization. 

B. Linear phase l p filter design 

Linear phase FIR filters are important tools in signal processing. 
As will be shown below, they do not require the user to specify a 
phase response in their design (since the assumption is that the desired 
phase response is indeed linear). Besides, they satisfy a number of 
symmetry properties that allow for the reduction of dimensions in the 
optimization process, making them easier to design computationally. 
Finally, there are applications where a linear phase is desired as such 
behavior is more physically meaningful. 

1 ) Four types of linear phase filters: The frequency response of 
an FIR filter h(n) is given by 

N— 1 

H(cj) = Hn)e- j "" 

In general, H{uj) = R(oj) + is a periodic complex function 

of ui (with period 27r). Therefore it can be written as follows, 



R(uj)+jl{bj) 



where the magnitude response is given by 



A(u) = \H(u)\ = v/i?(o;) 2 +/(w) 2 
and the phase response is 



(22) 



(23) 



(ui) — arctan 



R(u>) 



However A(uj) is not analytic and 4>(oj) is not continuous. From 
a computational point of view \22) would have better properties if 
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both A(u) and 4>(uj) were continuous analytic functions of oj; an 
important class of filters for which this is true is the class of linear 
phase filters [28]. 

Linear phase filters have a frequency response of the form 



H(u) 



A(u)e 



J>(w) 



(24) 



where A(lj) is the real, continuous amplitude response of H(uj) and 

<p(u) = K 1 + K 2 u 

is a linear phase function in ui (hence the name); K\ and K2 
are constants. The jumps in the phase response correspond to sign 
reversals in the magnitude resulting as defined in \23\ . 

Consider a length- N FIR filter (assume for the time being that N 
is odd). Its frequency response is given by 



Odd 


Even 


M—l 

A(u) = h(M) + 2 ^2 h ( n ) cosw(Af - n) 

n = 

t/>(w) = —coM 


fe; 


Odd 


M—l 

A(w) = 2 ^2 K n ) sincj(M - n) 

n — Q 

<h( w ) = S _ uM 


ven 


Even 


2 l 

A(ui) = h(M) + 2 Y2 H n ) cos "(M ~ n ) 

n=0 

0(w) = -uM 


a 




K _i 
2 A 




Odd 


A(u>) = 2 V} h(n) sin u(M — n) 
0(w) = f - o;M 



TABLE I 

The four types of linear phase FIR filters. 



H(u) = Y, h (n)e- 



-ju>M 



jw(M-n) 



(25) 



where M = ^-w^- Equation i25i can be written as follows, 



H(lj) = e 



-juiM 



jojM 



+ ... + h(M - l)e JUJ + h(M) 



+h(M + l)e" J " + . 



h(2M)e 



-juMi 



(26) 



It is clear that for an odd-length FIR filter to have the linear phase 
form described in {24}, the term inside braces in l |26[ > must be a real 
function (thus becoming A(uj)), By imposing even symmetry on the 
filter coefficients about the midpoint (n = M), that is 



h(k) = h(2M - k) 



equation \2b\ becomes 



H(u) = e 



-jujM 



h(M) + 2 Y h ( n ) cosw(A/ 



(27) 



Similarly, with odd symmetry (i.e. h(k) — h(2M — k)) equation \ 26\ 
becomes 



H(u) = eP^~ uM) 2 J2 h(n)smu}{M -n) 



(28) 



Note that the term h(M) disappears as the symmetry condition 
requires that 

h(M) = h(N — M — 1) = -h{M) = 
Similar expressions can be obtained for an even-length FIR filter, 



< ' ^2 h(n)e 



-juiM 



(29) 



It is clear that depending on the combinations of iV and the symmetry 
of h(n), it is possible to obtain four types of filters |28|, [43], [44 1. 
Table |I] shows the four possible linear phase FIR filters described by 
\24\, where the second column refers to the type of filter symmetry. 



2) lRLS-based methods: Section |II-B1| introduced linear phase 
filters in detail. In this section we cover the use of IRLS methods to 
design linear phase FIR filters according to the l v optimality criterion. 
Recall from Section [Tl-B 1 | that for any of the four types of linear phase 
filters their frequency response can be expressed as 

H(u) = A{u)e 3{Kl+K2Ul) 

Since A(oj) is a real continuous function as defined by Table [I] one 
can write the linear phase l p design problem as follows 



min ||-D(w) — A{uj\ a) 



(30) 



where a relates to h by considering the symmetry properties outlined 
in Table|l] Note that the two objects from the objective function inside 
the l p norm are real. By sampling l |30[ > one can write the design 
problem as follows 

min y^\D(uj k ) - A(u k ; a)\ p 
k 

or 

min VlTJfe -C k a\ p (31) 

a — J 

k 

where D k is the fc-th element of the vector D representing the 
sampled desired frequency response D(oj k ), and C k is the fc-th row 
of the trigonometric kernel matrix as defined by Table [I] 

One can apply the basic IRLS approach described in Section [LC] 
to solve by posing this problem as a weighted least squares one: 

min Yw k \D k -C k a\ 2 (32) 

k 

The main issue becomes iteratively finding suitable weights w for 
\32\ so that the algorithm converges to the optimal solution a* of 
the l p problem \30\ . Existence of adequate weights is guaranteed by 
Theorem [T| as presented in Section [LC{ finding these optimal weights 
is indeed the difficult part. Clearly a reasonable choice for w is that 
which turns {32} into {3TJ, namely 

w = \D- Ca\ p -' 2 

Therefore the basic IRLS algorithm for problem < |3 1 [ > would be: 

1) Initialize the weights wo (a reasonable choice is to make them 
all equal to one). 

2) At the i-th iteration the solution is given by 



a i+1 = [C T WfWiC] _1 C T WfWiD 



(33) 
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3) Update the weights with 

w i+1 = \D - Ca i+ i\ p ~ 2 

4) Repeat the last steps until convergence is reached. 

It is important to note that Wj = diag(y / io7). In practice it has 
been found that this approach has practical deficiencies, since the 
inversion required by l |33[ > often leads to an ill-posed problem and, 
in most cases, convergence is not achieved. 

As mentioned before, the basic IRLS method has drawbacks that 
make it unsuitable for practical implementations. Charles Lawson 
considered a version of this algorithm applied to the solution of 
problems (for details refer to |4|). His method has linear convergence 
and is prone to problems with proportionately small residuals that 
could lead to zero weights and the need for restarting the algorithm. In 
the context of l p optimization, Rice and Usow |5 1 built upon Lawson's 
method by adapting it to l p problems. Like Lawson's methods, the 
algorithm by Rice and Usow updates the weights in a multiplicative 
manner; their method shares similar drawbacks with Lawson's. Rice 
and Usow defined 



where 



and 



„ _ 7(P - 2) 



T(p-2) + l 
p-2 



B=- = 

' 2 7 2 7 (p - 2) + 2 

and follow the basic algorithm. 

L. A. Karlovitz realized the computational problems associated 
with the basic IRLS method and improved on it by partially updating 
the filter coefficient vector. He defines 



Gi+l 



[C T W l T W 1 C]" 1 C T WfW 1 £> 



and uses a in 



H+l 



= Xa i+ i + (1 — X)ai 



(34) 



(35) 



where A £ [0, 1] is a partial step parameter that must be adjusted at 
each iteration. Karlovitz's method |8| has been shown to converge 
globally for even values of p (where 2 < p < oo). In practice, 
convergence problems have been found even under such assumptions. 
Karlovitz proposed the use of line searches to find the optimal value 
of A at each iteration, which basically creates an independent opti- 
mization problem nested inside each iteration of the IRLS algorithm. 
While computationally this search process for the optimal A makes 
Karlovitz's method impractical, his work indicates the feasibility of 
IRLS methods and proves that partial updating indeed overcomes 
some of the problems in the basic IRLS method. Furthermore, 
Karlovitz's method is the first one to depart from a multiplicative 
updating of the weights in favor of an additive updating on the filter 
coefficients. In this way some of the problems in the Lawson-Rice- 
Usow approach are overcome, especially the need for restarting the 
algorithm. 

S. W. Kahng built upon the findings by Karlovitz by considering 
the process of finding an adequate A for partial updating. He applied 
Newton-Raphson's method to this problem and proposed a closed 
form solution for A, given by 

' (36) 



A 



p-l 



resulting in 



Oj+i = Xa i+1 + (1 — A)a, 



(37) 



The rest of Kahng's algorithm follows Karlovitz's approach. How- 
ever, since A is fixed, there is no need to perform the linear search 



at each iteration. Kahng's method has an added benefit: since it 
uses Newton's method to find A, the algorithm tends to converge 
much faster than previous approaches. It has indeed been shown to 
converge quadratically. However, Newton-Raphson-based algorithms 
are not guaranteed to converge globally unless at some point the 
existing solution lies close enough to the solution, within their radius 
of convergence [11]. Fletcher, Grant and Hebden |10] derived the 
same results independently. 

Burrus, Barreto and Selesnick (7J, (12| , |13) modified Kahng's 
methods in several important ways in order to improve on their initial 
and final convergence rates and the method's stability (we refer to this 
method as BBS). The first improvement is analogous to a homotopy 
|14| . Up to this point all efforts in l p filter design attempted to solve 
the actual l p problem from the first iteration. In general there is no 
reason to believe that an initial guess derived from an unweighted 
h formulation (that is, the 1% design that one would get by setting 
Wo = 1) will look in any way similar to the actual l p solution that 
one is interested in. However it is known that there exists a continuity 
of l p solutions for 1 <p < oo. In other words, if a J is the optimal 
h solution, there exists a p for which the optimal l p solution a* is 
arbitrarily close to a^; that is, for a given 5>0 



\o-2 — <5 



for some p G (2, oo) 



This fact allows anyone to gradually move from an l p solution to an 
l q solution. 

To accelerate initial convergence, the BBS method of Burrus et al. 
initially solves for h by setting po = 2 and then sets pi = a ■ pt-i, 
where a is a convergence parameter defined by \ <o<2. Therefore 
at the z-th iteration 



Pi = min (pdes, crpi-i) 



(38) 



where pd es corresponds to the desired l p solution. The implementa- 
tion of each iteration follows Karlovitz's method with Kahng's choice 
of A, using the particular selection of p given by {38} , 

To summarize, define the class of IRLS algorithms as follows: after 
i iterations, given a vector a.i the IRLS iteration requires two steps, 

1) Find w % = /(di) 

2) Find a i+1 = g(wi,a,i) 

The following is a summary of the IRLS-based algorithms dis- 
cussed so far and their corresponding updating functions: 

1) Basic IRLS algorithm. 

. w t = \D Ca t \ p ~ 2 
. Wj = diag( x /«77) 

. a l+ i = [C T WfW,C] _1 C T W, T W,D 

2) Rice-Usow-Lawson (RUL) method 

. Wi = udl-D - Ca t \^ 
. Wj = diag(iOj) 

. a l+1 = [C T WfW l C]- 1 C T WfW,.D 

. _ 7(P~2) 
7 (p-2) + l 

• 7 constant 

3) Karlovitz' method 

. Wi = \D- Ca,i\ v - 2 

• Wi = diag( % /Io7) 
. a l+1 = A [C T Wf W,C 

• A constant 

4) Kahng's method 

. », = \D Ca,| p - 2 
. W t = diag(^/wJ7) 

• Qi+l = 



C T WfW l Z3 + (1 - X)cn 



C T WfWiCl 



_1 C T WfWi£> + 



p-2 

p-i 
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5) BBS method 

• Pi = min (pdes,o- ■ Pi-i) 
. Wi = \D- C<Zi| K - 2 

• Wi = diag(0i7i) 

. a»+i = (^) [C T W?'W i C]" 1 C T W?'Wir> + 

• a constant 

3) Modified adaptive IRLS algorithm: Much of the performance 
of a method is based upon whether it can actually converge given a 
certain error measure. In the case of the methods described above, 
both convergence rate and stability play an important role in their 
performance. Both Karlovitz and RUL methods are supposed to 
converge linearly, while Kahng's and the BBS methods converge 
quadratically, since they both use a Newton-based additive update 
of the weights. 

Barreto showed in 1 1 2| that the modified version of Kahng's 
method (or BBS) typically converges faster than the RUL algorithm. 
However, this approach presents some peculiar problems that depend 
on the transition bandwidth /3, For some particular values of /3, the 
BBS method will result in an ill-posed weight matrix that causes the 
l p error to increase dramatically after a few iterations as illustrated 
in Figure [5] (where / = lj/2-it). 

a) f =0 2. f =0 25 
s 

11 I 1 1 1 1 1 1 1 1 



1 - 




50 100 150 200 

Iteration 

Fig. 2. Error jumps on IRLS methods. 

Two facts can be derived from the examples in Figure [2] for this 
particular bandwidth the error increased slightly after the fifth and 
eleventh iterations, and increased dramatically after the sixteenth. 
Also, it is worth to notice that after such increase, the error started 
to decrease quadratically and that, at a certain point, the error 
became flat (thus reaching the numerical accuracy limits of the digital 
system). 

The effects of different values of a were studied to find out if a 
relationship between a and the error increase could be determined. 
Figure [5] shows the l p error for different values of (3 and for a = 1.7. 
It can be seen that some particular bandwidths cause the algorithm 
to produce a very large error. 

Our studies (as well as previous work from J. A. Barreto |12| ) 
demonstrate that this error explosion occurs only for a small range of 



L error for different bandwidts, f -0.2 




Bandwidth p 



Fig. 3. Relationship between bandwidth and error jumps. 



bandwidth specifications. Under most circumstances the BBS method 
exhibits fast convergence properties to the desired solution. However 
at this point it is not understood what causes the error increase and 
therefore this event cannot be anticipated. In order to avoid such 
problem, we propose the use of an adaptive scheme that modifies 
the BBS step. As p increases the step from a current l p guess to the 
next also increases, as described in fj38[ >. In other words, at the i-th 
iteration one approximates the £ 2c r* solution (as long as the algorithm 
has not yet reached the desired p); the next iteration one approximates 
Z 2cr i+i. There is always a possibility that these two solutions lie far 
apart enough that the algorithm takes a descent step so that the Z 2(r i+i 
guess is too far away from the actual Z 2ct «+i solution. This is better 
illustrated in Figure [4] 




Fig. 4. A step too long for IRLS methods. 

The conclusions derived above suggest the possibility to use an 
adaptive algorithm |45| that changes the value of a so that the 
error always decreases. This idea was implemented by calculating 
temporary new weight and filter coefficient vectors that will not 
become the updated versions unless their resulting error is smaller 
than the previous one. If this is not the case, the algorithm "tries" 
two values of <r, namely 

ctl = o * (1 — <5) and oh = o~ * (1 + S) (39) 

(where S is an updating variable). The resulting errors for each 
attempt are calculated, and o is updated according to the value that 
produced the smallest error. The error of this new o is compared 
to the error of the nonupdated weights and coefficients, and if the 
new o produces a smaller error, then such vectors are updated; 
otherwise another update of o is performed. The modified adaptive 
IRLS algorithm can be summarized as follows, 

1) Find the unweighted approximation ao = [C T C] 1 C T D 
and use po — 2a (with 1 < o < 2) 



a) L error (f =0.2, 0=0.048, K =1 .75) 



L 2 -L m Error for L designs 




10 12 
Iterations 



b) Variations in K 



10 12 

Iterations 



16 18 20 



Fig. 5. FIR design example using adaptive method, a) l p error obtained with 
the adaptive method; b) Change of a. 



2) Iteratively solve (B4b and 1 35 1 using Ai = 



and find the 



or the i-m iteration 



z„ L and £a H 



to compare it with e. 



resulting error Si 

3) If £i > Ei_i, 

. Calculate <[39j» 
• Select the smallest of e„ 

until a value is found that results in a decreasing error 

Otherwise iterate as in the BBS algorithm. 

The algorithm described above changes the value of a that causes 
the algorithm to produce a large error. The value of a is updated as 
many times as necessary without changing the values of the weights, 
the filter coefficients, or p. If an optimal value of a exists, the 
algorithm will find it and continue with this new value until another 
update in a becomes necessary. 

The algorithm described above was implemented for several com- 
binations of a and /3; for all cases the new algorithm converged faster 
than the BBS algorithm (unless the values of a and /3 are such that 
the error never increases). The results are shown in Figure[5]a for the 
specifications from Figure|2] Whereas using the BBS method for this 
particular case results in a large error after the sixteenth iteration, the 
adaptive method converged before ten iterations. 

Figure [5]b illustrates the change of a per iteration in the adaptive 
method, using an update factor of 8 = 0.1. The l v error stops 
decreasing after the fifth iteration (where the BBS method introduces 
the large error); however, the adaptive algorithm adjusts the value of 
a so that the l p error continues decreasing. The algorithm decreased 
the initial value of a from 1.75 to its final value of 1.4175 (at the 
expense of only one additional iteration with a = 1.575). 

One result worth noting is the relationship between Z2 and 
Zoo solutions and how they compare to l p designs. Figure [6] 
shows a comparison of designs for a length-21 Type-I linear 
phase low pass FIR filter with transition band defined by 
/ = {0.2,0.24}. The curve shows the I2 versus Zoo errors (namely 
£2 and £00); the values of p used to make this curve were p = 
{2, 2.2, 2.5, 3, 4, 5, 7, 10, 15, 20, 30, 50, 60, 100, 150, 200, 400, 00} 
(Matlab's fir Is and f irpm functions were used to design the h 
and Zoo filters respectively). Note the very small decrease in £00 after 
p reaches 100. The curve suggests that a better compromise between 
£2 and £<x> can be reached by choosing 2 < p < 00. Furthermore, 
to get better results one can concentrate on values between p — 5 



Fig. 6. Relationship between I2 and Zoo errors for l p FIR filter design. 



and p = 20; fortunately, for values of p so low no numerical 
complications arise and convergence is reached in a few iterations. 



C. Complex l p problem 

The design of linear phase filters has been intensively discussed 
in literature. For the two most common error criteria (I2 and Zoo), 
optimal solution algorithms exist. The least squares norm filter can 
be found by solving an overdetermined system of equations, whereas 
the Chebishev norm filter is easily found by using either the Remez 
algorithm or linear programming. For many typical applications, lin- 
ear phase filters are good enough; however, when arbitrary magnitude 
and phase constraints are required, a more complicated approach 
must be taken since such design results in a complex approximation 
problem. By replacing C in the linear phase algorithm with a complex 
Fourier kernel matrix, and the real desired frequency vector D with 
a complex one, one can use the same algorithm from Section [II-B 3 1 
to design complex l p filters. 



D. Magnitude l p problem 

In some applications, the effects of phase are not a necessary factor 
to consider when designing a filter. For these applications, control of 
the filter's magnitude response is a priority for the designer. In order 
to improve the magnitude response of a filter, one must not explicitly 
include a phase, so that the optimization algorithm can look for the 
best filter that approximates a specified magnitude, without being 
constrained about optimizing for a phase response too. 

1 ) Power approximation formulation: The magnitude approxima- 
tion problem can be formulated as follows: 



min ||.D(w) - \H(u;h)\ 

h 



(40) 



Unfortunately, the second term inside the norm (namely the absolute 
value function) is not differentiable when its argument is zero. 
Although one could propose ways to work around this problem, 
we propose the use of a different design criterion, namely the 
approximation of a desired magnitude squared. The resulting problem 
is 

\D{Lof-\H{u-h)\ 2 \\l 



mm 

h 
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The autocorrelation r(n) of a causal length-iV FIR filter h(n) is 
given by 

N-l 

r(n) = h(n) * h(-n) = ^ h(k)h(n + k) (41) 

fc=-(JV-l) 

The Fourier transform of the autocorrelation r(n) is known as the 
Power Spectral Density function [46] R(oj) (or simply the SPD), and 
is defined as follows, 

JV-l 

R{lu) = J2 r{n)e- 3Uln 

n = -(N-l) 

N-l JV-l 

n=-(N-l) k=-(N-l) 

From the properties of the Fourier Transform (47 1 §3.3] one can show 
that there exists a frequency domain relationship between h(n) and 
r(n) given by 

R(lj) = H(lj) ■ H*{-u>) = \H(lj)\ 2 

This relationship suggests a way to design magnitude- squared filters, 
namely by using the filter's autocorrelation coefficients instead of the 
filter coefficients themselves. In this way, one can avoid the use of 
the non-differentiable magnitude response. 

An important property to note at this point is the fact that since 
the filter coefficients are real, one can see from l |41[ > that the 
autocorrelation function r(n) is symmetric; thus it is sufficient to 
consider its last TV values. As a result, the PSD can be written as 



E. l p error as a function of frequency 

Previous sections have discussed the importance of complex least- 
square and Chebishev error criteria in the context of filter design. 
In many applications any of these two approaches would provide 
adequate results. However, a case could be made where one might 
want to minimize the error energy in a range of frequencies while 
keeping control of the maximum error in a different band. This idea 
results particularly interesting when one considers the use of different 
l p norms in different frequency bands. In principle one would be 
interested in solving 

min \\D(oj pb ) - H(uj pb ; h)\\ v + \\D(oj sb ) - H(uj sb ; h)\\ q (43) 

h 

where {u) p b £ Q p b,Wsb £ 0, ab } represent the pass and stopband 

frequencies respectively. In principle one would want Q p b PI £l s b = 
{0}. Therefore problem {43} can be written as 



min pfy] \D(uj ph ) - H(cu pb ; h)\P 

+ a \Diujsb) - H(u sh ; h)\ 



(44) 



One major obstacle in l |44| l is the presence of the roots around the 
summation terms. These roots prevent us from writing l |44} in a 
simple vector form. Instead, one can consider the use of a similar 
metric function as follows 



min \D{u) pb ) — H(uj pb ; h)\ p + 

h *■ — ' 

J2 \D{u sb ) - H{uj 3b ;h)\ q 



(45) 



■■ r(0) + 2r(n) cost 



in a similar way to the linear phase problem. 

The symmetry property introduced above allows for the use of 
the l p linear phase algorithm of section ( |II-B} to obtain the au- 
tocorrelation coefficients of h(n). However, there is an important 
step missing in this discussion: how to obtain the filter coefficients 
from its autocorrelation. To achieve this goal, one can follow a 
procedure known as Spectral Factorization. The objective is to use the 
autocorrelation coefficients r £ R N instead of the filter coefficients 
h £ H N as the optimization variables. The variable transformation is 
done using l |42[ >, which is not a one-to-one transformation. Because of 
the last result, there is a necessary condition for a vector r £ R" to 
be a valid autocorrelation vector of a filter. This is summarized 1 48 1 
in the spectral factorization theorem, which states that r £ M. N is the 
autocorrelation function of a filter h(n) if and only if R(uo) > for all 
cj £ [0, tt]. This turns out to be a necessary and sufficient condition 
(48) for the existence of r(n). Once the autocorrelation vector r 
is found using existing robust interior-point algorithms, the filter 
coefficients can be calculated via spectral factorization techniques. 

Assuming a valid vector r G M. N can be found for a particular 
filter h, the problem presented in d40b can be rewritten as 



(42) 



L(wf < R(oj) < U{uj) 2 Vu£[0, tt] 



In i 42 i the existence condition R(uj) > is redundant since < L(uj) 



and, thus, is not included in the problem definition. For each uj, the 
constraints of |42j constitute a pair of linear inequalities in the vector 
r; therefore the constraint is convex in r. Thus the change of variable 
transforms a nonconvex optimization problem in h into a convex 
problem in r. 



This expression is similar to < |44| > but does not include the root terms. 
An advantage of using the IRLS approach on l |45[ > is that one can 
formulate this problem in the frequency domain and properly separate 
residual terms from different bands into different vectors. In this 
manner, the l p modified measure given by l |45[ l can be made into 
a frequency-dependent function of p(oj) as follows, 



nun||I>( W )-ff( W ;fc)||^ 



H(uj;h) 



Therefore this frequency-varying l p problem can be solved following 
the modified IRLS algorithm outlined in Section |II-B3| with the 
following modification: at the i-th iteration the weights are updated 
according to 

w t = \D- Ca t \ p(uj) - 2 

It is fundamental to note that the proposed method does not indeed 
solve a linear combination of l p norms. In fact, it can be shown 
that the expression l |45[ > is not a norm but a metric. While from a 
theoretical perspective this fact might make l |45[ ) a less interesting 
distance, as it turns out one can use (45} to solve the far more 
interesting CLS problem, as discussed below in Section [Tl-F| 



F. Constrained Least Squares ( CLS) problem 

One of the common obstacles to innovation occurs when knowl- 
edge settles on a particular way of dealing with problems. While 
new ideas keep appearing suggesting innovative approaches to design 
digital filters, it is all too common in practice that h and loo dominate 
error criteria specifications. This section is devoted to exploring a 
different way of thinking about digital filters. It is important to 
note that up to this point we are not discussing an algorithm yet. 
The main concern being brought into play here is the specification 
(or description) of the design problem. Once the Constrained Least 
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(46) 



Squares (CLS) problem formulation is introduced, we will present 
an IRLS implementation to solve it, and will justify our approach 
over other existing approaches. It is the author's belief that under 
general conditions one should always use our IRLS implementation 
over other methods, especially when considering the associated 
management of transition regions. 

The CLS problem was introduced in Section |I-D| and is repeated 
here for clarity, 

min \\D(u)-H(u;h)\\ a 

h 

subject to \D(lj) — H(uj;h)\<T 

To the best of our knowledge this problem was first introduced in the 
context of filter design by John Adams 1 16] in 1991. The main idea 
consists in approximating iteratively a desired frequency response in a 
least squares sense except in the event that any frequency exhibits an 
error larger than a specified tolerance r. At each iteration the problem 
is adjusted in order to reduce the error on offending frequencies 
(i.e. those which do not meet the constraint specifications). Ideally, 
convergence is reached when the altered least squares problem 
has a frequency response whose error does not exceed constraint 
specifications. As will be shown below, this goal might not be attained 
depending on how the problem is posed. 

Adams and some collaborators have worked in this problem and 
several variations 1 15]. However his main (and original) problem was 
illustrated in |16| with the following important assumption: the defi- 
nition of a desired frequency response must include a fixed non-zero 
width transition band. His method uses Lagrange multiplier theory 
and alternation methods to find frequencies that exceed constraints 
and minimize the error at such locations, with an overall least squares 
error criterion. 

Burrus, Selesnick and Lang |17| looked at this problem from a 
similar perspective, but relaxed the design specifications so that only 
a transition frequency needs to be specified. The actual transition 
band does indeed exist, and it centers itself around the specified 
transition frequency; its width adjusts as the algorithm iterates (con- 
straint tolerances are still specified). Their solution method is similar 
to Adams' approach, and explicitly uses the Karush-Kuhn-Tucker 
(KKT) conditions together with an alternation method to minimize 
the least squares error while constraining the maximum error to meet 
specifications. 

C. S. Burrus and the author of this work have been working 
on the CLS problem using IRLS methods with positive results. 
This document is the first thorough presentation of the method, 
contributions, results and code for this approach, and constitutes one 
of the main contributions of this work. It is crucial to note that 
there are two separate issues in this problem: on one hand there 
is the matter of the actual problem formulation, mainly depending 
on whether a transition band is specified or not; on the other hand 
there is the question of how the selected problem description is 
actually met (what algorithm is used). Our approach follows the 
problem description by Burrus et al. shown in )17) with an IRLS 
implementation. 

1 ) Two problem formulations: As mentioned in Section |II-F| one 
can address problem \A6) in two ways depending on how one views 
the role of the transition band in a CLS problem. The original problem 
posed by Adams in |16| can be written as follows, 

min \\D(uj) — H(oj\ h)\\2 
h (47) 

s.t. \D(u) -H(u;h)\<T V u> £ [0, ui p b] U [u s b,n] 

where < u) p b < oj s b < t. From a traditional standpoint this formula- 
tion feels familiar. It assigns fixed frequencies to the transition band 
edges as a number of filter design techniques do. As it turns out, 
however, one might not want to do this in CLS design. 



An alternate formulation to l |47| > could implicitly introduce a 
transition frequency oj t b (where oj p b < ui t b < ^s&); the user only 
specifies uj t b- Consider 



mm 

h 



\\D(u) -H(u;h)\\ 2 V^G[0,7r] 



subject to \D(ui) — H(u; h)\ <r Vu£ [0, uj p b] U [uisb, 7r] 

(48) 

The algorithm at each iteration generates an induced transition band 
in order to satisfy the constraints in 1 48 1. Therefore {ui p b, w s f>} vary 
at each iteration. 



H(m) 



H(m) 
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Fig. 7. Two formulations for Constrained Least Squares problems. 

It is critical to point out the differences between \A1) and l |48[ >. 
Figure|7]a explains Adams' CLS formulation, where the desired filter 
response is only specified at the fixed pass and stop bands. At any 
iteration, Adam's method attempts to minimize the least squares error 
(£2) at both bands while trying to satisfy the constraint r. Note 
that one could think of the constraint requirements in terms of the 
Chebishev error £00 by writing \A1\ as follows, 

min \\D{u) - H(u);h)\\ 2 

h 

s.t. \\D(u) - H(u);h)\\oo<T £ [0,cj pb ] U [cj sb ,7r] 

In contrast, Figure [7]b illustrates our proposed problem \4S\ . The idea 
is to minimize the least squared error £2 across all frequencies while 
ensuring that constraints are met in an intelligent manner. At this 
point one can think of the interval {ui p b, ^sb) as an induced transition 
band, useful for the purposes of constraining the filter. Section [II-F2| 
presents the actual algorithms that solve (48} , including the process 
of finding {uj pb , ui sb }. 

It is important to note an interesting behavior of transition bands 
and extrema points in I2 and l^, filters. Figure [8] shows I2 and 
Zoo length-15 linear phase filters (designed using Matlab's firls 
and f irpm functions); the transition band was specified at {oj p b = 
0A/n,u} 3 b = O.5/71-}. The dotted I2 filter illustrates an important 
behavior of least squares filters: typically the maximum error of an 
I2 filter is located at the transition band. The solid filter shows 
why minimax filters are important: despite their larger error across 
most of the bands, the filter shows the same maximum error at all 
extrema points, including the transition band edge frequencies. In 
a CLS problem then, typically an algorithm will attempt to reduce 
iteratively the maximum error (usually located around the transition 
band) of a series of least squares filters. 

Another important fact results from the relationship between the 
transition band width and the resulting error amplitude in 1^, filters. 
Figure [9] shows two loo designs; the transition bands were set at 
{0.4/7T, O.5/71-} for the solid line design, and at {0.4/7T, O.6/71-} for 
the dotted line one. One can see that by widening the transition band 
a decrease in error ripple amplitude is induced. 

These two results together illustrate the importance of the transition 
bandwidth for a CLS design. Clearly one can decrease maximum 
error tolerances by widening the transition band. Yet finding the 
perfect balance between a transition bandwidth and a given tolerance 
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Fig. 8. Comparison of I2 and l x niters. 




Fig. 9. Effects of transition bands in filters. 



can prove a difficult task, as will be shown in Section [II-F2| Hence 
the relevance of a CLS method that is not restricted by two types 
of specifications competing against each other. In principle, one 
should just determine how much error one can live with, and allow 
an algorithm to find the optimal transition band that meets such 
tolerance. 

2) Two problem solutions: Section [Tl-F 1 1 introduced some impor- 
tant remarks regarding the behavior of extrema points and transition 
bands in I2 and loo filters. As one increases the constraints on an h 
filter, the result is a filter whose frequency response looks more and 
more like an filter. 

Section |II-E| introduced the frequency-varying problem and an 
IRLS-based method to solve it. It was also mentioned that, while the 
method does not solve the intended problem (but a similar one), it 
could prove to be useful for the CLS problem. As it turns out, in CLS 
design one is merely interested in solving an unweighted, constrained 
least squares problem. In this work, we achieve this by solving a 
sequence of weighted, unconstrained least squares problems, where 
the sole role of the weights is to "constraint" the maximum error of 
the frequency response at each iteration. In other words, one would 
like to find weights w such that 

min \\D(ui) - H(u;h)\\ 2 

h 

s.t. \\D(u) - iJ(w;fe)||cx)<r V uj E [0, io pb ] U [u ab , ir] 
is equivalent to 

min \\w(u) ■ (D(w) - H(lj; h))\\ 2 

h 

Hence one can revisit the frequency-varying design method and use 



it to solve the CLS problem. Assuming that one can reasonably 
approximate 1^ by using high values of p, at each iteration the main 
idea is to use an l p weighting function only at frequencies where the 
constraints are exceeded. A formal formulation of this statement is 



w{e(uj)) = 



if |e(w)|>r 
otherwise 



Assuming a suitable weighting function existed such that the 
specified tolerances are related to the frequency response constraints, 
the IRLS method would iterate and assign rather large weights to 
frequencies exceeding the constraints, while inactive frequencies get 
a weight of one. As the method iterates, frequencies with large errors 
move the response closer to the desired tolerance. Ideally, all the 
active constraint frequencies would eventually meet the constraints. 
Therefore the task becomes to find a suitable weighting function that 
penalizes large errors in order to have all the frequencies satisfying 
the constraints; once this condition is met, we have reached the 
desired solution. 

a) Polynomial weighting (p=100| b) Linear-log view (p=100) c) Linear-log view (p=5{J0) 



Fig. 10. CLS polynomial weighting function. 

One proposed way to find adequate weights to meet constraints is 
given by a polynomial weighting function of the form 



w(u) = 1 + 



where r effectively serves as a threshold to determine whether a 
weight is dominated by either unity or the familiar l p weighting term. 
Figure [TO] illustrates the behavior of such a curve. 

L Initial guess 



0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

f 



Fig. 1 1 . Original l 2 guess for CLS algorithm. 

In practice the method outlined above has proven robust par- 
ticularly in connection with the specified transition band design. 
Consider the least squares design in Figure [TJJ (using a length- 
21 Type-I linear phase low-pass FIR filter with linear transition 
frequencies {0.2, 0.25}). This example illustrates the typical effect of 
CLS methods over I2 designs; the largest error (in an l^ sense) can be 
located at the edges of the transition band. Figures [T2"| and [T3] illustrate 
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CLS Solution (t=0.06) 

1.2 I 1 1 1 1 1 1 — 



x 
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Fig. 12. CLS design example using mild constraints. 

CLS Solution (t=0.03) 

1.2 I 1 1 1 1 1 1 1 1 1 1 
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Fig. 13. CLS design example using tight constraints. 



a) l_ 2 Solution 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 



b) Intermediate Solution 
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Fig. 14. CLS design example without transition bands. 



design examples using the proposed approach. Figure [12] shows an 
example of a mild constraint (r = 0.6), whereas [T5] illustrates an 
advantage of this method, associated to a hard constraint (r = 0.3). 
The method is trying iteratively to reduce the maximum error towards 
the constraint; however the specified constraint in Figure [T3] is such 
that even at the point where an equiripple response is reached for 
the specified transition bands the constraint is not met. At this point 
the method converges to an optimal l p solution that approximates 
equiripple as p increases (the examples provided use p — 50). 

A different behavior occurs when no transition bands are defined. 
Departing from an initial h guess (as shown in Figure [l4]a) the 
proposed IRLS-based CLS algorithm begins weighting frequencies 
selectively in order to reduce the loa error towards the constraints r 
at each iteration. Eventually an equiripple behavior can be observed 
if the constraints are too harsh (as in Figure [l4]b). The algorithm 
will keep weighting until all frequencies meet the constraints (as 
in Figure [l4]c). The absence of a specified transition band presents 
some ambiguity in defining valid frequencies for weighting. One 
cannot (or rather should not) apply weights too close to the transition 
frequency specified as this would result in an effort by the algorithm 
to create a steep transition region (which as mentioned previously 
is counterintuitive to finding an equiripple solution). In a sense, this 
would mean having two opposite effects working at the same time and 
the algorithm cannot accommodate both, usually leading to numerical 
problems. 
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Fig. 15. Definition of induced transition band. 

In order to avoid these issues, an algorithm can be devised that 
selects a subset of the sampled frequencies for weighting purposes 
at each iteration. The idea is to identify the largest ripple per band at 
each iteration (the ripple associated with the largest error for a given 
band) and select the frequencies within that band with errors equal 
or smaller than such ripple error. In this way one avoids weighting 
frequencies around the transition frequency. This idea is illustrated 
in Figure [13] 

The previous example is fundamental since it illustrates the rel- 
evance of this method: since for a particular transition band the 
tightest constraint that one can get is given by the equiripple (or 
minimax) design (as shown in Section |II-F1} , a problem might arise 
when specifications are tighter than what the minimax design can 
meet. Adams found this problem (as reported in |16|); his method 
breaks under these conditions. The method proposed here overcomes 
an inadequate constraint and relaxes the transition band to meet the 
constraint. 

It is worth noting that the polynomial weighting form works even 
when no transition bands are specified (this must become evident 
from Figure [l4]c above). However, the user must be aware of some 
practical issues related to this approach. Figure [16] shows a typical 
CLS polynomial weighting function. Its "spiky" character becomes 
more dramatic as p increases (the method still follows the homotopy 
and partial updating ideas from previous sections) as shown in Figure 
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a) Weights at early iteration 
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b) Weights near convergence 
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solution in an efficient manner, with the added flexibility that milder 
constraints will result in CLS designs. 



a) CLS Envelope Solution 
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Fig. 16. CLS weights. 



|16| b. It must be evident that the algorithm will assign heavy weights 
to frequencies with large errors, but at p increases the difference in 
weighting exaggerates. At some point the user must make sure that 
proper sampling is done to ensure that frequencies with large weights 
(from a theoretical perspective) are being included in the problem, 
without compromising conputational efficiency (by means of massive 
oversampling, which can lead to ill-conditioning in numerical least 
squares methods). Also as p increases, the range of frequencies 
with signifficantly large weights becomes narrower, thus reducing 
the overall weighting effect and affecting convergence speed. 



b) Envelope weights 




Weighted 
Frequencies 




Fig. 17. CLS envelope weighting function. 

A second weighting form can be defined where envelopes are 
used. The envelope weighting function approach works by assigning 
a weight to all frequencies not meeting a constraint. The value of 
such weights are assigned as flat intervals as illustrated in Figure 
[17] Intervals are determined by the edge frequencies within neigh- 
borhoods around peak error frequencies for which constraints are 
not met. Clearly these neighborhoods could change at each iteration. 
The weight of the fc-th interval is still determined by our typical 
expression, 

w fc (w) = \e(uj+)\ I ^ i 

where uj^ is the frequency with largest error within the fc-th interval. 

Envelope weighting has been applied in practice with good results. 
It is particularly effective at reaching high values of p without 
ill-conditioning, allowing for a true alternative to minimax design. 
Figure [T8| shows an example using r = 0.4; the algorithm managed 
to find a solution for p — 500. By specifying transition bands 
and unachievable constraints one can produce an almost equiripple 



Fig. 18. CLS design example using envelope weights. 

3) Comparison with l p problem: This chapter presented two 
problems with similar effects. On one hand, Section [Il-B 3 1 illustrated 
the fact (see Figure [6j that as p increases towards infinity, an l p filter 
will approximate an one. On the other hand, Section |lI-F| presented 
the constrained least squared problem, and introduced IRLS-based 
algorithms that produce filters that approximate equiripple behavior 
as the constraint specifications tighten. 

A natural question arises: how do these methods compare with 
each other? In principle it should be possible to compare their per- 
formances, as long as the necessary assumptions about the problem 
to be solved are compatible in both methods. Figure [19] shows a 
comparison of these algorithms with the following specifications: 

• Both methods designed length-21 Type-I lowpass linear phase 
digital filters with fixed transition bands defined by / = 
{0.2,0.24} (in normalized linear frequency). 

• The l p experiment used the following values of p = {2, 2.2, 
2.5, 3,4, 5, 7, 10, 15, 20, 30, 50, 70, 100, 170, 400} 

• The CLS experiment used the polynomial weighting method 
with fixed transition bands and a value of p = 60. The error 
tolerances were r = {.06, .077, .078, .8, .084, .088, .093, .1, 
.11, .12, .13, .14, .15, .16, .17, .18} 

Some conclusions can be derived from Figure [T9] Even though at the 
extremes of the curves they both seem to meet, the CLS curve lies just 
below the l p curve for most values of p and r. These two facts should 
be expected: on one hand, in principle the CLS algorithm gives an 
h filter if the constraints are so mild that they are not active for any 
frequency after the first iteration (hence the two curves should match 
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around p = 2). On the other hand, once the constraints become too 
harsh, the fixed transition band CLS method basically should design 
an equiripple filter, as only the active constraint frequencies are l p - 
weighted (this effects is more noticeable with higher values of p). 
Therefore for tight constraints the CLS filter should approximate an 
Zoo filter. 

The reason why the CLS curve lies under the l p curve is because 
for a given error tolerance (which could be interpreted as for a given 
minimax error £00) the CLS method finds the optimal I2 filter. An l p 
filter is optimal in an l p sense; it is not meant to be optimal in either 
the I2 or Zoo senses. Hence for a given r it cannot beat the CLS filter 
in an l 2 sense (it can only match it, which happens around p = 2 or 
p — 00). 

It is important to note that both curves are not drastically different. 
While the CLS curve represents optimality in an L2 — loo sense, 
not all the problems mentioned in this work can be solved using 
CLS filters (for example, the magnitude IIR problem presented in 
Section [TlI-C2| l. Also, one of the objectives of this work is to motivate 
the use of l p norms for filter design problems, and the proposed 
CLS implementations (which absolutely depends on IRLS-based l p 
formulations) are good examples of the flexibility and value of l p 
IRLS methods discussed in this work. 
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Fig. 19. Comparison between CLS and l p problems. 



III. Infinite Impulse Response Filters 

Chapter |Il] introduced the problem of designing l p FIR filters, 
along with several design scenarios and their corresponding design 
algorithms. This chapter considers the design of l p IIR filters and 
examines the similarities and differences compared to l p FIR filter 
design. It was mentioned in Section [LP] that l p FIR design involves a 
polynomial approximation. The problem becomes more complicated 
in the case of IIR filters as the approximation problem is a ratio of 
two polynomials. In fact, the case of FIR polynomial approximation 
is a special form of IIR rational approximation where the denominator 
is equal to 1. 

Infinite Impulse Response (or recursive) digital filters constitute 
an important analysis tool in many areas of science (such as signal 



processing, statistics and biology). The problem of designing IIR 
filters has been the object of extensive study. Several approaches 
are typically used in designing IIR filters, but a general procedure 
follows: given a desired filter specification (which may consist of 
an impulse response or a frequency specification), a predetermined 
approximation error criterion is optimized. Although one of the most 
widely used error criteria in Finite Impulse Response (FIR) filters is 
the least-squares criterion (which in most scenarios merely requires 
the solution of a linear system), least-squares (I2) approximation for 
IIR filters requires an optimization over an infinite number of filter 
coefficients (in the time domain approximation case). Furthermore, 
optimizing for an IIR frequency response leads to a rational (nonlin- 
ear) approximation problem rather than the polynomial problem of 
FIR design. 

As discussed in the previous chapter, a successful IRLS-based l p 
algorithm depends to a large extent in the solution of a weighted 
1-2 problem. One could argue that one of the most important aspects 
contrasting FIR and IIR l p filter design lies in the h optimization 
step. This chapter presents the theoretical and computational issues 
involved in the design of both I2 and l p IIR filters and explores several 
approaches taken to handle the resulting nonlinear I2 optimization 
problem. Section [III-A| introduces the IIR filter formulation and the 
nonlinear least-squares design problem. Section [III-B | presents the I2 
problem more formally, covering relevant methods as a manner of 
background and to lay down a framework for the approach proposed 
in this work. Some of the methods covered here date back to the 
1960's, yet others are the result of current active work by a number 
of research groups; the approach employed in this work is described 
in section |III-B13| Finally, Section |III-C| considers different design 
problems concerning IIR filters in an l p sense, including IIR ver- 
sions of the complex, frequency-varying and magnitude filter design 
problems as well as the proposed algorithms and their corresponding 
results. 



A. IIR filters 

An IIR filter describes a system with input x(n) and output y(n), 
related by the following expression 

M N 

y( n ) = b ( k ) x ( n - k ) - a ( k )y( n - k ) 

k=0 k = l 

Since the current output y(n) depends on the input as well as on N 
previous output values, the output of an IIR filter might not be zero 
well after x(n) becomes zero (hence the name "Infinite"). Typically 
IIR filters are described by a rational transfer function of the form 



H(z) 



B(z) _ b + biz 1 -\ \-b M z~ 

A(z) ~ 1 + aiz" 1 + ■ 



where 



H(z) = jTh(r 



(49) 



(50) 



and h{n) is the infinite impulse response of the filter. Its frequency 
response is given by 



H(u) = H(z)\ z=eJ « 
Substituting \A9\ into {5T} we obtain 



H(u) = 



A{u) 



M 

E b n e- j " n 

n=0 

N 

1 + E 

n=l 



(51) 



(52) 
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Given a desired frequency response D[ui), the h IIR design problem 
consists of solving the following problem 

2 



mm 



B(u) 



A(u) 



D(u) 



(53) 



for the M + N + 1 real filter coefficients a n , b„ with uj £ Q, (where 
VL is the set of frequencies for which the approximation is done). A 
discrete version of {53} is given by 
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(54) 



where Uk are the L frequency samples over which the approximation 
is made. Clearly, ( |54) is a nonlinear least squares optimization 
problem with respect to the filter coefficients. 

B. Least squares design of IIR filters 

Section |III-A| introduced the IIR least squares design problem, 
as illustrated in \5A\ . Such problem cannot be solved in the same 
manner as in the FIR case; therefore more sophisticated methods 
must be employed. As will be discussed later in Section [III-C| some 
tradeoffs are desirable for l p optimization. As in the case of FIR 
design, when designing l p IIR filters one must use h methods as 
internal steps over which one iterates while moving between diferent 
values of p. Clearly this internal iteration must not be too demanding 
computationally since an outer l p loop will invoke it repeatedly (this 
process will be further illustrated in Section |TiI-Cl[ >. With this issue in 
mind, one needs to select an h algorithm that remains accurate within 
reasonable error bounds while remaining computationally efficient. 

This section begins by summarizing some of the traditional ap- 
proaches that have been employed for h rational approximation, both 
within and outside filter design applications. Amongst the several 
existing traditional nonlinear optimization approaches, the Davidon- 
Fletcher-Powell (DFP) and the Gauss-Newton methods have been 
often used and remain relatively well understood in the filter design 
community. A brief introduction to both methods is presented in 
Section [III-B1 1 and their caveats briefly explored. 

An alternative to attacking a complex nonlinear problem like {54} 
with general nonlinear optimization tools consists in linearization, an 
attempt to "linearize" a nonlinear problem and to solve it by using 
linear optimization tools. Multiple efforts have been applied to similar 
problems in different areas of statistics and systems analysis and 
design. Section [III-B2| introduces the notion of an Equation Error, a 
linear expression related to the actual Solution Error that one is in- 
terested in minimizing in h design. The equation error formulation is 
nonetheles important for a number of filter design methods (including 
the ones presented in this work) such as Levy's method, one of the 
earliest and most relevant frequency domain linearization approaches. 
Section [III-B4| presents a frequency domain equation error algorithm 
based on the methods by Prony and Pade. This algorithm illustrates 
the usefulness of the equation error formulation as it is fundamental 
to the implementation of the methods proposed later in this work (in 
Section jlTFC} . 

An important class of linearization methods fall under the name 
of iterative prefiltering algorithms, presented in Section |III-B8| 
The Sanathanan-Koerner (SK) algorithm and the Steiglitz-McBride 
(SMB) methods are well known and commonly used examples in this 
category, and their strengths and weaknesses are explored. Another 
recent development in this area is the method by Jackson, also 
presented in this section. Finally, Soewito's quasilinearization (the 
method of choice for least squares IIR approximation in this work) 
is presented in Section [III-B 1 3| 



1) Traditional optimization methods: One way to adress \54\ is 
to attempt to solve it with general nonlinear optimization tools. 
One of the most typical approach in nonlinear optimization is to 
apply either Newton's method or a Newton-based algorithm. One 
assumption of Newton's method is that the optimization function 
resembles a quadratic function near the solution. In order to update a 
current estimate, Newton's method requires first and second order 
information through the use of gradient and Hessian matrices. A 
quasi-Newton method is one that estimates in a certain way the 
second order information based on gradients (by generalizing the 
secant method to multiple dimensions). 

One of the most commonly used quasi-Newton methods in IIR 
filter design is the Davidon-Fletcher-Powell (DFP) method |20|. In 
1970 K. Steiglitz [49] used the DFP method to solve an IIR magnitude 
approximation to a desired real frequency response. For stability 
concerns he used a cascade form of the IIR filter given in l|49[> through 



1 + a r z 1 + b r z 



+ C r Z~ 



(55) 



Therefore he considered the following problem, 



min ( a TT - 

a„,b„,c„,d„ ^— ' \ 1\ 1 



1 + a r e- JtJk + b r e~ 2i " k 



D(w k ) 



His method is a direct implementation of the DFP algorithm in the 
problem described above. 

In 1972 Andrew Deczky |50| employed the DFP algorithm to solve 
a complex IIR least-p approximation to a desired frequency response. 
Like Steiglitz, Deczky chose to employ the cascaded IIR structure of 
{55}, mainly for stability reasons but also because he claims that 
for this structure it is simpler to derive the first order information 
required for the DFP method. 

The MATLAB Signal Processing Toolbox includes a function 
called INVFREQZ, originally written by J. Smith and J. Little (22) . 
Invfreqz uses the algorithm by Levy (see j piI-B3| > as an initial step 
and then begins an iterative algorithm based on the damped Gauss- 
Newton |21| to minimize the solution error e s according to the least- 
squared error criteria. This method performs a line search after every 
iteration to find the optimal direction for the next step. Invfreqz 
evaluates the roots of A(z) after each iteration to verify that the 
poles of H{z) lie inside the unit circle; otherwise it will convert the 
pole into its reciprocal. This approach guarantees a stable filter. 

Among other Newton-based approaches, Spanos and Mingori |23| 
use a Newton algorithm combined with the Levenberg-Marquardt 
technique to improve the algorithm's convergence properties. Their 
idea is to express the denominator function A(u) as a sum of second- 
order rational polynomials. Thus H(cj) can be written as 



b r + ju!/3 r 
+ ju)/3 T - 



d 



Their global descent approach is similar to the one presented in [24|. 
As any Newton-based method, this approach suffers under a poor 
initial guess, and does not guarantee to converge (if convergence 
occurs) to a local minimum. However, in such case, convergence is 
quadratic. 

Kumaresan's method |25| considers a three-step approach. It is not 
clear whether his method attempts to minimize the equation error or 
the solution error. He uses divided differences |26| to reformulate 
the solution error in terms of the coefficients at- Using Lagrange 
multiplier theory, he defines 



S = y T C T [CC T ]- l Cy 



(56) 
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where y = [H0H1 ■ ■ ■ Hl~i] t contains the frequency samples 
and C is a composition matrix containing the frequency divided 
differences and the coefficients a k (a more detailed derivation can 
be found in [51]). Equation l |56[ > is iterated until convergence of the 
coefficient vector a is reached. This vector is used as initial guess in 
the second step, involving a Newton-Raphson search of the optimal 
a that minimizes ||£||2. Finally the vector b is found by solving a 
linear system of equations. 

2) Equation error linearization methods: Typically general use 
optimization tools prove effective in finding a solution. However in 
the context of IIR filter design, they often tend to take a rather large 
number of iterations, generate large matrices or require complicated 
steps like solving or estimating (and often inverting) vectors and 
matrices of first and second order information [35|. Using gradient- 
based tools for nonlinear problems like {54} certainly seems like 
a suboptimal approach. Also, typical Newton-based methods tend 
to converge quick (quadratically), yet they make assumptions about 
radii of convergence and initial proximity to the solution (otherwise 
performance is suboptimal). In the context of filter design one 
should wonder if better performance could be achieved by exploiting 
characteristics from the problem. This section introduces the concept 
of linearization, an alternative to general optimization methods that 
has proven successful in the context of rational approximation. The 
main idea behind linearization approaches consists in transforming a 
complex nonlinear problem into a sequence of linear ones, an idea 
that is parallel to the approach followed in our development of IRLS 
l p optimization. 

A common notion used in this work (as well as some of the 
literature related to linearization and filter design) is that there are 
two different error measures that authors often refer to. It is important 
to recognize the differences between them as one browses through 
literature. Typically one would be interested in minimizing the I2 
error given by: 



e = ]T |£K)| 2 = ]T \A(cj k )D(ju; k ) - B[ 



(60) 



Observing the linear structure (in the coefficients A k , B k ) of equation 
{60}, Levy proposed minimizing the quantity e. He actually realized 
that this measure (what we would denote as the equation error) was 
indeed a weighted version of the actual solution error that one might 
be interested in; in fact, the denominator function A(ui) became the 
weighting function. 

Levy's proposed method for minimizing \6Q) begins by writing s 
as follows, 

L 

£ = ^2 [{RkO-k - UhTkh - ctk) 2 + {u] k r k Rk + or k I k - UJ k /3k) 2 ] 
fe=0 

(61) 

by recognizing that {59} can be reformulated in terms of its real and 
imaginary parts, 

a + juj/3 



with 



a + juj/3 =(Bo — B-iuj 1 + B^lu 4 • • • ) 

+ joj(Bi — B3L0 2 + B 5 uj 4 ■ • ■ ) 

a + jojT =(Ao — A2LU 2 + A4U) 4 ■ • ■ ) 

+ ju(A! - A z u 2 + A 5 u A ■■■) 

and performing appropriate manipulation^ Note that the optimal set 
of coefficients A k , B k must satisfy 



de 
dA 



Ax 



de 
dB 







\\£H\\l 



D(u) 



(57) 



This quantity is often referred to as the solution error (denoted by e s ); 
we refer to the function S(ui) in {57} as the solution error function, 
denoted by £ s (iS). Also, in linearization algorithms the following 
measure often arises, 



e=\\S(uj)f 2 = \\A(u)D(u) - B(u)\\ 



(58) 



This measure is often referred to as the equation error e e ; we denote 



the function £(ui) in i58i as the equation error function £^{lo). 
Keeping the notation previously introduced, it can be seen that the 
two errors relate by one being a weighted version of the other, 

£ b (uj) = A(u)£ s (u) 

3) Levy's method: E. C. Levy |27| considered in 1959 the follow- 
ing problem in the context of analog systems (electrical networks to 
be more precise): defin^] 

= ^°+^;+^;;+--- = m m 



Ao + Ai(Joj) + A 2 {juj) 2 + ■ 



Given L samples of a desired complex- valued function D(jujk) = 
R(u>h) + jl(wk) (where R, I are both real funtions of ui), Levy 
defines 



£(w) = D(ju) - H(ju) = D(ju 



A(u) 



The conditions introduced above generate a linear system in the filter 
coefficients. Levy derives the system 



Cx = y 



where C = {Ci C2} with 



Ci 



Ao 





-Aa 





A 4 





A 2 





-A 4 





A 2 





-A4 





A 6 


Tl 


-S a 


-T z 


Si 


T 5 


S2 


n 


-S A 


-Ts 


S 6 


T 3 


-Si 


-T s 


S 6 


T- 


Ti 


5 2 


-T 3 


-Si 


T 5 


-5a 


T 3 


5 4 


-n 


-s ( 


T z 


5 4 


-T 5 


-s 6 




u 2 





-Ui 





u 6 





U A 





-U e 





u 4 





-U 





Us 



(62) 



1 For consistency with the rest of this document, notation has been modified 
from the author's original paper whenever deemed necessary. 



2 For further details on the algebraic manipulations involved, the reader 
should refer to 1271. 
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and 



r Bo i 

B 2 



< 



-4 2 



So 
Ti 

S-2 




L'4 



(63) 



with 



\h = 



U h = 



L-l 

E wf-Ri 

L-l 
/=() 



Solving for the vector x from l |62f gives the desired coefficients (note 
the trivial assumption that Ao = 1). It is important to remember 
that although Levy's algorithm leads to a linear system of equations 
in the coefficients, his approach is indeed an equation error method. 
Matlab's invf reqz function uses an adaptation of Levy's algorithm 
for its least-squares equation error solution. 

4) Prony-based equation error linearization: A number of algo- 
rithms that consider the approximation of functions in a least-squared 
sense using rational functions relate to Prony's method. This section 
summarizes these methods especially in the context of filter design. 

5) Prony's method: The first method considered in this section is 
due to Gaspard Riche Baron de Prony, a Lyonnais mathematician and 
physicist which, in 1795, proposed to model the expansion properties 
of different gases by sums of damped exponentials. His method |29| 
approximates a sampled function f(n) (where /(n) = for n<0) 
with a sum of TV exponentials, 



f( n ) = E cfceSfc ™ = E Ck ^ 



(64) 



where Afc = e Sfc . The objective is to determine the TV parameters Ck 
and the TV parameters Sk in {64} given 27V samples of f(n). 
It is possible to express l|64) in matrix form as follows, 

/(0) 
/(I) 



" 1 

Ai 


1 

A 2 • 


1 " 

Ajv 




Cl 
C2 




. Af" 1 


A 2 


X N-l 
' N 




Cjv 





L f(N-l) J 

(65) 

System l |65} has a Vandermonde structure with TV equations, but 2TV 
unknowns (both Ch and A^ are unknown) and thus it cannot be solved 
directly. Yet the major contribution of Prony's work is to recognize 



that f(n) as given in i64i is indeed the solution of a homogeneous 
order-TV Linear Constant Coefficient Difference Equation (LCCDE) 
|52| ch. 4] given by 



N 
p=0 



P) = 



(66) 



with ao = 1. Since f(n) is known for < n < 2N 
extend |66l into an (N x N) system of the form 



1, we can 



f(N-l) 
f(N) 



f(N- 



2) 
1) 



/(0) 
/(I) 



L /(2/v 

where 



2) /(27V -3) 



/ = 





ai 






a 2 












ajv 





(67) 



f(N- 

~f(N) 
-f(N + l) 

-/(27V - 1) 

which we can solve for the coefficients a p . Such coefficients are then 
used in the characteristic equation (53| §2.3] of l |66| >, 

A^ + aiA^ -1 + ■ • ■ + ajv_iA + ajv = (68) 



The TV roots A^ of q67j are called the characteristic roots of q66p. 
From the A^ we can find the parameters using = In Afc. Finally, 
it is now possible to solve {65} for the parameters Ck- 

The method described above is an adequate representation of 
Prony's original method [29]. More detailed analysis is presented in 
1 54 1 — 1 57 1 and |58, §11.4]. Prony's method is an adequate algorithm 
for interpolating 2TV data samples with TV exponentials. Yet it is 
not a filter design algorithm as it stands. Its connection with IIR 
filter design, however, exists and will be discussed in the following 
sections. 

6) Pade's method: The work by Prony served as inspiration to 
Henry Pade, a French mathematician which in 1892 published a work 
1 30 1 discussing the problem of rational approximation. His objective 
was to approximate a function that could be represented by a power 
series expansion using a rational function of two polynomials. 

Assume that a function f(x) can be represented with a power 
series expansion of the form 



/(*) = E 



CkX 



Pade's idea was to approximate f(x) using the function 

B(x) 



where 



and 



m A {x) 

M 



(69) 



(70) 



fc=0 
N 



A(x) = 1 + a,kX k 



k=l 

The objective is to determine the coefficients and bk so that the 
first M + TV + 1 terms of the residual 

r{x) = A{x)f(x) - B{x) 

dissappear (i.e. the first TV + M derivatives of f(x) and f(x) are 
equal (55)). That is, |60) , 



"(x) = A(x) CkX k — B{x) — x 



M+N+l 



kX 



To do this, consider A(x)f(x) = B(x) |56| 

(1 + aix H + apfX N ) ■ (co + cix + ■ ■ ■ + CiX 1 H ) = 

bo + b\x + ■ ■ ■ + biAX 1 



19 



By equating the terms with same exponent up to order M + N + 1, 
we obtain two sets of equations, 



Co = 60 

aico + ci = 61 

(I2C0 + aici + C2 = 62 

a-iCo + a,2C\ + a\C2 + C3 = 63 

<inCm-n + o.n-iCm-n+1 + ■ ■ ■ + cm — bm 



(71) 



oncm-n+i + an-iCM-N+2 + ■ ■ ■ + cm+i 

ClNCM-N+2 + ajV-lCM-JV+3 + ■ ■ ■ + Cm+2 



clncm + <XN-lCM+l + ■ ■ ■ + Cm+N 







(72) 



Equation <[72p represents an TV x N system that can be solved for 
the coefficients a k given c(n) for 0<n<iV + M. These values can 
then be used in \71\ to solve for the coefficients b k . The result is a 
system whose impulse response matches the first N + M + 1 values 
of f(n). 

7) Prony-based filter design methods: Both the original methods 
by Prony and Pade were meant to interpolate data from applications 
that have little in common with filter design. What is relevant to this 
work is their use of rational functions of polynomials as models for 
data, and the linearization process they both employ. 

When designing FIR filters, a common approach is to take L 
samples of the desired frequency response D(uj) and calculate the 
inverse DFT of the samples. This design approach is known as fre- 
quency sampling. It has been shown 1 28 1 that by designing a length- 
L filter h(n) via the frequency sampling method and symmetrically 
truncating h(n) to N values (N <C L) it is possible to obtain a least- 
squares optimal length- AT filter /ijv(n). It is not possible however 
to extend completely this method to the IIR problem. This section 
presents an extension based on the methods by Prony and Pade, and 
illustrates the shortcomings of its application. 

Consider the frequency response defined in \52\ . One can choose 
L equally spaced samples of H(ui) to obtain 



H(ujk) = Hk = 



B k 



for k = 0, 1, 



(73) 



where A k and B k represent the length-!/ DFTs of the filter coeffi- 
cients a n and b n respectively. The division in j73} is done point-by- 
point over the L values of A k and B k . The objective is to use the 
relationship in described in {73} to calculate a n and b n . 

One can express < |V 3 1 > as B k = H k A k . This operation represents 
the length-!/ circular convolution 6(n) = h(n) (Z) a(n) defined as 
follows (43| §8.7.5] 

L-l 

b(n) = h(n)(D)a(n) 

m=0 

(74) 

where h(n) is the length-!/ inverse DFT of H k and the operator 
((•))i represents modulo L. Let 



1 







and b 



bo 
In 



I'M 





(75) 



Therefore l |74) can be posed as a matrix operation (28| §7.4.1] of the 
form 

Ho = b (76) 



where H = [H1H2] with 

h 
hi 



Hi 



h M 

h L -2 
h L -i 



h L -i 
ho 

hjvi-i 
h M 

h L -s 
h L -2 



hh-N 
hh-N+i 



h((L-N+M)) L 
\{L-N+M+\)) L 

hh-N-2 
hh-N-1 



hh-N-i 
hh-N 



l ((L-N+M-l)) L 
h((L-N+M)) L 

hh-N-3 
hh-N -2 



h 2 
h 3 



hi 
I12 



llM+2 hM+1 
hM+3 hM+2 



h 
hi 



h L -i 
ho 



Hence H is an L x L matrix. From 1 75 1 it is clear that the L — (JV+1) 
rightmost columns of H can be discarded (since the last L — (N + 1) 
values of & in l |75| > are equal to 0). Therefore equation l |76[ > can be 
rewritten as 



hi 



h L -i 
ho 



hivi hfrf-i 
hM+i hM 

hL-2 hL-3 

hh-i hL-2 



or in matrix notation 



h 



lL-N+1 



H(L-N+M)) L 
((L-N+M+1)) L 

llL-N-2 
hL-N-1 



1 

"1 

ON 



bo 
bi 



bm 




H 



1 




" b ' 


a 








Ha 





(77) 



(78) 



where a and b correspond to the length-iV and (M + 1) filter 
coefficient vectors respectively and H contains the first N + 1 
columns of H. It is possible to uncouple the calculation of a and b 
from |78l by breaking H furthermore as follows, 



H 



ho 


h L -i 


n-L-N 


hi 


h 


hL-N+l 


hM 


hM-i 


h((L-N+M)) L 


hM+i 


hM 


h((L-N+M+l)) L 


hL-2 


h L -3 


hL-N-2 


h L -i 


h L -2 


hL-N-1 



Therefore 



H 





Hi 




a — 


' b ' 




H 2 








(79) 
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with 



as defined in \7fy . This formulation allows to uncouple the calcula- 
tions for a and b using two systems, 



Hid 
H 2 d 



= b 
= 



Note that the last equation can be expressed as 
H2d = —hi 



(80) 



where H2 = [h% H2] (that is, hi and H2 contain the first and 
second through iV-th columns of H2 respectively). 

From i jsof one can conclude that ifL = N + M + l and if H2 
and Hi are nonsingular, then they can be inverted |^] to solve for the 
filter coefficient vectors a in {80} and solve for b using Hid = b. 

The algorithm described above is an interpolation method rather 
than an approximation one. If L > N + M + 1 and H2 is full column 
rank then ( |80) is an overdetermined linear system for which no exact 
solution exists; therefore an approximation must be found. From {73} 
we can define the solution error function £ B (uJk) as 



A(cj k ) 



H(LG k ) 



(81) 



Using this notation, the design objective is to solve the nonlinear 
problem 

min ||£ 4 (wjfc)||2 



Consider the system in equation {78}. If H2 is overdetermined, one 
can define an approximation problem by introducing an error vector 
e, 

b — Hd e 



(82) 



where 



ei 
e 2 



Again, it is possible to uncouple {82} as follows, 

b = Hid. ei 
e2 = hi + H 2 a 



(83) 
(84) 



One can minimize the least-squared error norm 1 1 ea 1 1 2 of the overde- 
termined system {84} by solving the normal equations |2I| 



m ho 



H2H20 



so that 



a = -[HlHan'H^ 



and use this result in {83} 

b = Hid (85) 
Equation {83} represents the following time-domain operation, 

e(n) = b(n) - h(n) © a(n) , 0<n<M 

(where © denotes circular convolution) and can be interpreted in the 
frequency domain as follows, 

£ e (uu k ) = S(w fe ) - H(oj k )A(u k ) (86) 

Equation {86} is a weighted version of ( |81) , as follows 

£ e (iO k ) = A(Wfc)£ s (u>fc) 

3 In practice one should not invert the matrices Hi and H2 but use a more 
robust and efficient algorithm. See |61 1 for details. 



Therefore the algorithm presented above will find the filter coefficient 
vectors a and b that minimize the equation error £ e in {86} in the 
least-squares sense. Unfortunately, this error is not what one may 
want to optimize, since it is a weighted version of the solution error 

8) Iterative prefiltering linearization methods: Section |III-B2| in- 
troduced the equation error formulation and several algorithms that 
minimize it. In a general sense however one is more interested 
in minimizing the solution error problem from {54}. This section 
presents several algorithms that attempt to minimize the solution 



error formulation from 1 53 1 by prefiltering the desired response D(w) 
in \5S\ with A(uj). Then a new set of coefficients {a n ,b n } are 
found with an equation error formulation and the prefiltering step 
is repeated, hence defining an iterative procedure. 

9) Sanathanan-Koerner (SK) method: The method by Levy pre- 
sented in Section |III-B3| suggests a relatively easy-to-implement 
approach to the problem of rational approximation. While interesting 
in itself, the equation error e e does not really represent what in 
principle one would like to minimize. A natural extension to Levy's 
method is the one proposed |31| by C. K. Sanathanan and J. 
Koerner in 1963. The algorithm iteratively prefilters the equation 
error formulation of Levy with an estimate of A(lj). The SK method 
considers the solution error function £ s defined by 

B(oj) 1 . . . . „ , . 

[A{u)D{u) 



£ s (oj) = D(u) 



A(u) 
1 

aW, 



B(u) 



£ e (uj) 



Then the solution error problem can be written as 



mm e s 



(87) 



(88) 



where 



£s = X^i£-( w fc)i 2 



fc=0 
L 



fc=0 1 K " 

= W{uj)\£ e {uj k )Y 



(89) 



Note that given A(u), one can obtain an estimate for B(lj) by 
minimizing £ e as Levy did. This approach provides an estimate, 
though, because one would need to know the optimal value of A(u) 
to truly optimize for B(oj). The idea behind this method is that by 
solving iteratively for A(uj) and B{ui) the algorithm would eventually 
converge to the solution of the desired solution error problem defined 



by 1 88 1. Since A(uj) is not known from the beginning, it must be 
initialized with a reasonable value (such as A(uj k ) — 1). 

To solve {88} Sanathanan and Koerner defined the same linear 
system from {62} with the same matrix and vector definitions. 
However the scalar terms used in the matrix and vectors reflect the 
presence of the weighting function W(oj) in e s as follows, 



Ah 
Sh 



1=0 

L—l 

1=0 

L—l 

1=0 

L—l 
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Then, given an initial definition of A(oj), at the p-th iteration one 
sets 

W(u) = 1 (90) 

\Ap-i{u)k)\ 

and solves \62\ using {A, S, T, U} as defined above until a con- 
vergence criterion is reached. Clearly, solving l |88[ > using {89) is 
equivalent to solving a series of weighted least squares problems 
where the weighting function consists of the estimated values of A(u) 
from the previous iteration. This method is similar to a time-domain 
method proposed by Steiglitz and McBride [34], presented later in 
this chapter. 

10) Method of Sid-Ahmed, Chottera and Jullien: The methods 
by Levy and Sanathanan and Koerner did arise from an analog 
analysis problem formulation, and cannot therefore be used directly 
to design digital filters. However these two methods present important 
ideas that can be translated to the context of filter design. In 1978 
M. Sid-Ahmed, A. Chottera and G. Jullien followed on these two 
important works and adapted |32| the matrix and vectors used by 
Levy to account for the design of IIR digital filters, given samples 
of a desired frequency response. Consider the frequency response 



H(uS) defined in \52\. In parallel with Levy's development, the 



corresponding equation error can be written as 



(91) 



with 



F k {u) = I (R k + jl k ) 1 + a-ie 



5> 



One can follow a similar differentiation step as Levy by setting 

de e de e de e 



dai a,2 



dbo 



= ... = 



with as defined in l |91} . Doing so results in a linear system of the 
form 

Cx = y 



where the vectors x and y are given by 

bo 1 <Po — ro 



x = < 



ai 
a N 



y = < 



4>M — Tm 
-ft 



(92) 



The matrix C has a special structure given by 
C = 



Y 



where SE 1 and T are symmetric Toeplitz matrices of order M + 1 
and N respectively, and their first row is given by 



Tim 



Vm-1 



for m ; 
for m ; 



.,M+ 1 
.,N 



Matrix <1? has order M + 1 X N and has the property that elements 
on a given diagonal are identical (i.e. = <f>i+i i j+i). Its entries 
are given by 



+ r m 

-2 — r„ 



for m - 
for m - 



.,N 

.,M + 1 



The parameters {77, <j>, r, /3} are given by 



rji = cos iui k 
k-o 

L 

Pi = ^ \D(ujk)\ 2 cos iuj k 

k=0 
L 

4>i = S ^2,Rk COS lUJ k 
k=0 
L 

Ti = y~] h sin iwk 

fe=0 



for 0<i<M 



f or < i < N - 1 



for < i < max(iV, M — 1) 



for < i < max(iV, M — 1) 



The rest of the algorithm works the same way as Levy's. For a 
solution error approach, one must weight each of the parameters 
mentioned above with the factor from ^90\ as in the SK method. 

There are two important details worth mentioning at this point: 
on one hand the methods discussed up to this point (Levy, SK and 
Sid-Ahmed et al.) do not put any limitation on the spacing of the 
frequency samples; one can sample as fine or as coarse as desired in 
the frequency domain. On the other hand there is no way to decouple 
the solution of both numerator and denominator vectors. In other 
words, from |63j and \92\ one can see that the linear systems that 
solve for vector x solve for all the variables in it. This is more 
of an issue for the iterative methods (SK & Sid-Ahmed), since at 
each iteration one solves for all the variables, but for the purposes of 
updating one needs only to keep the denominator variables (they get 
used in the weighting function); the numerator variables are never 
used within an iteration (in contrast to Burrus' Prony-based method 
presented in Section |TlI-B4[ >. This approach decouples the numerator 
and denominator computation into two separate linear systems. One 
only needs to compute the denominator variables until convergence 
is reached, and only then it becomes necessary to compute the 
numerator variables. Therefore most of the iterations solve a smaller 
linear system than the methods involved up to this point. 

11) Steiglitz-McBride iterative algorithm: In 1965 K. Steiglitz 
and L. McBride presented an algorithm [33) , |34| that has become 
quite popular in statistics and engineering applications. The Steiglitz- 
McBride method (SMB) considers the problem of deriving a transfer 
function for either an analog or digital system from their input and 
output data; in essence it is a time-domain method. Therefore it is 
mentioned in this work for completeness as it closely relates to the 
methods by Levy, SK and Sid-Ahmed, yet it is far better known and 
understood. 

The derivation of the SMB method follows closely that of SK. In 
the Z-domain, the transfer function of a digital system is defined by 



H{z) = 



B{z) _ b + b lZ 1 + . . . + b N z~ 
A(z) 1 + a,\Z-\ + . . . + aNZ~ 



Furthermore 



Y{z) = H(z)X(z) 



B(z) 
A(z) 



X(z) 



Steiglitz and McBride define the following problem, 



2ttj 



X(z) 



B(z) 
A(z) 



D(z] 



dz 



(93) 



where X(z) = ^jXjz"- 1 and D(z) = ^jdjz' 3 represent the 
z-transforms of the input and desired signals respectively. Equation 
{93) is the familiar nonlinear solution error function expressed in the 
Z-domain. Steiglitz and McBride realized the complexity of such 
function and proposed the iterative solution J93l using a simpler 
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problem defined by 

min e e = £ j( 



j \X{z)B(z) - D{z)A{z)\ 2 ^ 



(94) 

This linearized error function is the familiar equation error in the 
Z-domain. Steiglitz and McBride proposed a two-mode iterative 
approach. The SMB Mode 1 iteration is similar to the SK method, 
in that at the k-th iteration a linearized error criterion based on (94) 
is used, 



£k(z) 



Bk(z) -X(z)- a M "\ d(z) 



where 



W k (z) [B k {z)X{z) - A k {z)D{z)] 
W k (z) 



(95) 



A k -i{z) 

Their derivatioij^] leads to the familiar linear system 

Cx = y 

with the following vector definitions 

r bo i 



d 



-N+l 
■3-1 



ajv I dj-N 
The vector is referred to as the input-output vector. Then 



y 



SMB Mode 2 is an attempt at reducing further the error once 
Mode 1 produces an estimate close enough to the actual solution. 
The idea behind Mode 2 is to consider the solution error defined by 
(93) and equate its partial derivatives with respect to the coefficients 
to zero. Steiglitz and McBride showed |33| , |34| that this could be 
attained by defining a new vector 



-JV+l 



Vj- 



Vj-N 



Then 
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E^ r J 



y 



The main diference between Mode 1 and Mode 2 is the fact that 
Mode 1 uses the desired values to compute its vectors and matrices, 
whereas Mode 2 uses the actual output values from the filter. The 
rationale behind this is that at the beggining, the output function 
y(t) is not accurate, so the desired function provides better data for 
computations. On the other hand, Mode 1 does not really solve the 

4 For more details the reader should refer to |33|, |34|. 



desired problem. Once Mode 1 is deemed to have reached the vicinity 
of the solution, one can use true partial derivatives to compute the 
gradient and find the actual solution; this is what Mode 2 does. 

It has been claimed that under certain conditions the Steiglitz- 
McBride algorithm converges. However no guarantee of global 
convergence exists. A more thorough discussion of the Steiglitz- 
McBride algorithm and its relationships to other parameter estimation 
algorithms (such as the Iterative Quadratic Maximum Likelihood 
algorithm, or IQML) are found in [62|-|64 |. 

12) Jackson's method: The following is a recent approach (from 
2008) by Leland Jackson |36| based in the frequency domain. 
Consider vectors a € R N and b € R M such that 



AH 



where H(uj), B(uj), A(uj) are the Fourier transforms of h, b and a 
respectively. For a discrete frequency set one can describe Fourier 
transform vectors B = W;,6 and A — W a a (where W(,,W a 
correspond to the discrete Fourier kernels for b, a respectively). 
Define 



H a (u k ) = 



A(oj k ) 



In vector notation, let D a = diag(iJ a ) = diag(l/A). Then 

H(u) = = H*(u)B(u) => H = T> a B (96) 

A(LO) 

Let Ha(u) be the desired complex frequency response. Define Dd = 
diag(Hd)- Then one wants to solve 

min E*E = \\E\\l 

where E — H — Hd- From (96) one can write H = Hd + E as 

H = ~D a B = T> a W b b (97) 

Therefore 

H d = H-E = D a W b b - E (98) 
Solving (98| for b one gets 

b = (D a W b )\H d (99) 

Also, 

H d = T> d i = D d D a A = D a G d A = D a D d W a a 
where I is a unit column vector. Therefore 

HE — Hd = D Q D d W a a 

From (98) we get 

D Q W 6 b E = D a D d W a a 

or 

D Q D d W a a + E — D a W 6 b 
which in a least squares sense results in 

a = (D a D d W a )\(D a W ft b) (100) 

From (99) one gets 

a = (D a D d W a )\(D a Wi,[(D Q W 6 )\ff d ]) 



As a summary, at the i-th iteration one can write l |98[ > and (TOOf as 
follows, 

b s = (diag(l/A l _ 1 )W b )\iJ d 

a, = (diag(l/A l _ 1 )diag(iJ d )W a )\(diag(l/A l _i)W b b I ) 
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13) Soewito's quasilinearization method: Consider the equation 
error residual function 

e(cjh) = B(uj k ) - D{uj k ) ■ A(ui k ) 

M / N 

= J2 b n e- jUkn - D(u k ) ■ 1 + Yl a » e 

n=0 \ n=l 

= &o + he-^" +■■■+ b M e- J " kM ■ ■ ■ 

-D k - D kai e-^ k D k a N e~ iUhN 

= (bo + ■ • ■ b M e- jUkM ) 

D k (a ie - ju * +---a N e-i" kN ) - D k 

with D k = D(ui k ). The last equation indicates that one can represent 
the equation error in matrix form as follows, 

e = Fh — D 

where F = [Fi F2] and 

1 e~ 3U ° 



where the pq-th term of J is given by J 



Fi 



-juoM 



1 e' 3 " 1 - 1 
-D e- j "° 



-D e- juj ° N 



and 



A, 



6 
61 

h = &m and D = 

1 

ajv 

Consider now the solution error residual function 

a(u k ) = H(u k ) - D(uj k ) = ^4 ^ D M 

A(uj k ) 

[B(uj k ) - D{u) k ) ■ A(wfc)] 



= W(tJ fc )e(ajfc) 

Therefore one can write the solution error in matrix form as follows 
s = W(Fh D) (101) 



where W is a diagonal matrix with -jj^rj m i ts diagonal. From 1 
the least-squared solution error e s = s*s can be minimized by 



101 



From 



h = (F*W 2 F) _1 F*W 2 D 
102 1 an iteratioi£] could be defined as follows 
h l+1 = (F*W 2 F)- 1 F*W l 2 J D 



(102) 



by setting the weights W in 1 101 1 equal to A k (cj), the Fourier 



transform of the current solution for a. 

A more formal approach to minimizing e 3 consists in using a 
gradient method (these approaches are often referred to as Newton- 
like methods). First one needs to compute the Jacobian matrix J of s, 

5 Soewito refers to this expression as the Steiglitz-McBride Mode-1 in 
frequency domain. 



PI — 8h, 



with s as defined 



in < | 1 1 [ > . Note that the p-th element of s is given by 

Bp 

Sp — Hp Dp — — Dp 
Ap 

For simplicity one can consider these reduced form expressions for 
the independent components of h, 

M 



dsp 
db~ q 

ds p 
da a 



_L JL 

Ar, db„ 
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Therefore on can express the Jacobian J as follows, 
J = WG 

where G = [Gi G2] and 

1 e - juo 



(103) 
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Consider the solution error least-squares problem given by 
min f(h) = s T s 

h 

where s is the solution error residual vector as defined in \101\ and 
depends on h. It can be shown [21 pp. 219] that the gradient of the 
squared error f(h) (namely V/) is given by 

V/ = J*s (104) 

A necessary condition for a vector h to be a local minimizer of f(h) 
is that the gradient V/ be zero at such vector. With this in mind and 
combining (TOT) and fT03> in fT04> one gets 

V/ = G*W 2 (Fh-D) = (105) 

Solving the system {105} gives 

h = (G*W 2 F) -1 G*W 2 £> 

An iteration can be defined as follows^] 

hi+i = (G*WiF) _1 G*Wi-D (106) 

where matrices W and G reflect their dependency on current values 
of a and b. 

Atmadji Soewito |35| expanded the method of quasilinearization 
of Bellman and Kalaba 1 65 1 to the design of IIR filters. To understand 
his method consider the first order of Taylor's expansion near Hi(z), 
given by 

H i+1 (z) = Hi(z)+ 

[B i+1 (z) - Bi(z)]Aj(z) - [A i+ i(z) - Ai{z)]Bi{z) 
AH*) 

= H l {z) + 

B l+1 (z) - Bi(z) B l (z)[A l+1 (z) - Ai(z)] 



Ai{z) 



Al{z) 



6 Soewito refers to this expression as the Steiglitz-McBride Mode-2 in 
frequency domain. Compare to the Mode-1 expression and the use of Gi 
instead of F. 
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Using the last result in the solution error residual function s(uj) and 
applying simplification leads to 

, . B i+ i(w) Hi(u)A i+ i(uj) Bi(u) . . 
Ai(uj) Ai{uj) Ai{uj) 

- -^—[B 1+1 {oj) - Hi(uj)A i+1 (tj) + Bi(u) . . . (107) 



( Given qp, 6q 



At(w 



Ai(w)D(oj)] 



Equation |107| can be expressed (dropping the use of uj for simplic- 
ity) as 

s = w(( [B i+1 - H,{A l+1 - l)]-Hi) + ( [Bt - D(Ai - 1)]-Z?)) 

(108) 

One can recognize the two terms in brackets as Ghi+i and Fhi 
respectively. Therefore {108} can be represented in matrix notation 
as follows, 



s = W[Gh l+1 -{D + H z - Fhi)] 
with H = [Ho, -Hi, • • • , H L - 1 ] T . Therefore one can minimize s s 



from ( |109| > with 

h i+1 = (G*W*G i y 1 G*W*(D + Hi - Fhi) (110) 
since all the terms inside the parenthesis in JTTol are constant at the 



(i + l)-th iteration. In a sense, 1 1 1 is similar to 1 106 1, where the 
desired function is updated from iteration to iteration as in \l 10\ . 

It is important to note that any of the three algorithms can be 
modified to solve a weighted li IIR approximation using a weighting 
function W(ui) by defining 



V(u) = 



W(uj) 



(111) 



Taking \ \ 1 1\ into account, the following is a summary of the three 
different updates discussed so far: 

SMB Frequency Mode-1: h i+1 = (F*V?F) _1 F* Y^D 
SMB Frequency Mode-2: h i+ i = (G*V^F) _1 G*V^jD 
Soewito's quasilinearization: hi+i = (G*V^Gi) _1 * . . . 

GjVf (£) + Hi- Fhi) 

C. l p approximation 

Infinite Impulse Response (IIR) filters are important tools in signal 
processing. The flexibility they offer with the use of poles and zeros 
allows for relatively small filters meeting specifications that would 
require somewhat larger FIR filters. Therefore designing IIR filters 
in an efficient and robust manner is an inportant problem. 

This section covers the design of a number of important l p IIR 
problems. The methods proposed are consistent with the methods 
presented for FIR filters, allowing one to build up on the lessons 
learned from FIR design problems. The complex l p IIR problem is 
first presented in Section |III-C1| being an essential tool for other 
relevant problems. The l p frequency-dependent IIR problem is also 
introduced in Section [III-C 1 1 While the frequency-dependent formu- 
lation might not be practical in itself as a filter design formulation, 
it is fundamental for the more relevant magnitude l p IIR filter design 
problem, presented in Section [III-C2| 

Some complications appear when designing IIR filters, among 
which the intrinsic least squares solving step clearly arises from 
the rest. Being a nonlinear problem, special handling of this step 
is required. It was detemined after thorough experimentation that the 
quasilinearization method of Soewito presented in Section |III-B13| 
can be employed successfully to handle this issue. 




L weighting 



Solve 1^ complex 
V ^ — problem via L 2 

quasilinearization 



( END ) 



(109) Fig. 20. Block diagram for complex l p IIR algorithm. 



1) Complex and frequency-dependent l p approximation: Chapter 
|n| introduced the problem of designing l p complex FIR filters. The 
complex l p IIR algorithm builds up on its FIR counterpart by 
introducing a nested structure that internally solves for an I2 complex 
IIR problem. Figure [20] illustrates this procedure in more detail. This 
method was first presented in (66). 
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Fig. 21. Results for complex Zioo IIR design. 

Compared to its FIR counterpart, the IIR method only replaces the 
weighted linear least squares problem for Soewito's quasilinearization 
algorithm. While this nesting approach might suggest an increase 
in computational expense, it was found in practice that after the 
initial h iteration, in general the l p iterations only require from 
one to only a few internal weighted I2 quasilinearization iterations, 
thus maintaining the algorithm efficiency. Figures [21] through [23] 
present results for a design example using a length-5 IIR filter 
with p — 100 and transition edge frequencies of 0.2 and 0.24 (in 
normalized frequency). 

Figure [2T] compares the h and l p results and includes the desired 
frequency samples. Note that no transition band was specified. Figure 
[22] illustrates the effect of increasing p. The largest error for the l 2 
solution is located at the transition band edges. As p increases the 
algorithm weights the larger errors heavier; as a result the largest 
errors tend to decrease. In this case the magnitude of the frequency 
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Magnitude soln. 
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magnitude Lp problem 




Fig. 22. Maximum error for I2 and £100 complex IIR designs. 



Fig. 24. Block diagram for magnitude l p IIR method. 
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Fig. 23. Error curve for (100 complex IIR design. 



response went from 0.155 at the stopband edge (in the I2 case) to 
0.07 (for the l p design). Figure [23] shows the error function for the 
l p design, illustrating the quasiequiripple behavior for large values of 
P- 

Another fact worth noting from Figure [21] is the increase in the 
peak in the right hand side of the passband edge (around / = 0.22). 
The l p solution increased the amplitude of this peak with respect to 
the corresponding I2 solution. This is to be expected, since this peak 
occurs at frequencies not included in the specifications, and since the 
lp algorithm will move poles and zeros around in order to meet find 
the optimal l p solution (based on the frequencies included for the 
filter derivation). The addition of a specified transition band function 
(such as a spline) would allow for control of this effect, depending 
on the user's preferences. 

The frequency-dependent FIR problem was first introduced in 
Section |II-E| Following the FIR approach, one can design IIR 
frequency-dependent filters by merely replacing the linear weighted 
least squares step by a nonlinear approach, such as the quasilineariza- 
tion method presented in Section |III-B13| (as in the complex l p IIR 
case). This problem illustrates the flexibility in design for l v IRLS- 
based methods. 

2) Magnitude l p IIR design: The previous sections present algo- 
rithms that are based on complex specifications; that is, the user must 
specify both desired magnitude and phase responses. In some cases it 
might be better to specify a desired magnitude response only, while 



allowing an algorihm to select the phase that optimally minimizes the 
magnitude error. Note that if an algorithm is given a phase in addition 
to a magnitude function, it must then make a compromise between 
approximating both functions. The magnitude l p IIR approximation 
problem overcomes this dilemma by posing the problem only in terms 
of a desired magnitude function. The algorithm would then find the 
optimal phase that provides the optimal magnitude approximation. A 
mathematical formulation follows, 



min 



\D(u)\ 



B(uj;b) 



A(lo; a) 



(112) 



A critical idea behind the magnitude approach is to allow the 
algorithm to find the optimum phase for a magnitude approximation. 
It is important to recognize that the optimal magnitude filter indeed 
has a complex frequency response. Atmadji Soewito |35| published 
in 1990 a theorem in the context of I2 IIR design that demonstrated 
that the phase corresponding to an optimal magnitude approximation 
could be found iteratively by updating the desired phase in a complex 
approximation scenario. In other words, given a desired complex 
response Do one can solve a complex I2 problem and take the 
resulting phase to form a new desired response D + from the original 
desired magnitude response with the new phase. That is, 

A+i = |D |e** 

where Do represents the original desired magnitude response and 
e 3 ^* is the resulting phase from the previous iteration. This approach 
was independently suggested (36) by Leland Jackson and Stephen 
Kay in 2008. 

This work introduces an algorithm to solve the magnitude l p IIR 
problem by combining the IRLS-based complex l p IIR algorithm 
from Section |III-C1| with the phase updating ideas from Soewito, 
Jackson and Kay. The resulting algorithm is robust, efficient and flex- 
ible, allowing for different orders in the numerator and denominator 
as well as even or uneven sampling in frequency space, plus the 
optional use of specified transition bands. A block diagram for this 
method is presented in Figure [24] 

The overall l p IIR magnitude procedure can be summarized as 
follows, 

1) Experimental analysis demonstrated that a reasonable initial 
solution for each of the three main stages would allow for faster 
convergence. It was found that the frequency domain Prony 
method by Burrus |28| (presented in Section |III-B4[ > offered 
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a good initial guess. In Figure [24] this method is iterated to 
update the specified phase. The outcome of this step would be 
an equation error l 2 magnitude design. 

2) The equation error I2 magnitude solution from the previous 
step initializes a second stage where one uses quasilinearization 
to update the desired phase. Quasilinearization solves the true 
solution error complex approximation. Therefore by iterating 
on the phase one finds at convergence a solution error I2 
magnitude design. 

3) The rest of the algorithm follows the same idea as in the 
previous step, except that the least squared step becomes 
a weighted one (to account for the necessary l p homotopy 
weighting). It is also crucial to include the partial updating 
introduced in Section |II-B2| By iterating on the weights one 
would find a solution error l p magnitude design. 

Figures |25] through [29] illustrate the effectiveness of this algorithm 
at each of the three different stages for length-5 filters a and b, with 
transition edge frequencies of 0.2 and 0.24 (in normalized frequency) 
and p = 30. A linear transition band was specified. Figures [25] [25] 
and[25]show the equation error I2, solution error I2 and solution error 
l p . Figure [28] shows a comparison of the magnitude error functions 
for the solution error I2 and l p designs. Figure [29] shows the phase 
responses for the three designs. 
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Fig. 25. Equation error I2 magnitude design. 

From Figures[28]and[29]one can see that the algorithm has changed 
the phase response in a way that makes the maximum magnitude 
error (located in the stopband edge frequency) to be reduced by 
approximately half its value. Furthermore, Figure [28] demonstrates 
that one can reach quasiequiripple behavior with relatively low values 
of p (for the examples shown, p was set to 30). 
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Fig. 26. Solution error I2 magnitude design. 



IV. Conclusions 



Digital filters are essential building blocks for signal processing 
applications. One of the main goals of this work is to illustrate 
the versatility and relevance of l v norms in the design of digital 
filters. While popular and well understood, l 2 and loo filters do tend 
to accentuate specific benefits from their respective designs; filters 
designed using l p norms as optimality criteria can offer a tradeoff 
between the benefits of these two commonly used criteria. This work 
presented a number of applications of L p norms in both FIR and IIR 
filter design, and their corresponding design algorithms and software 
implementation. 

The basic workhorse for the methods presented in this document 
is the Iterative Reweighted Least Squares algorithm, a simple yet 
powerful method that sets itself naturally adept for the design of l p 
filters. The notion of converting a mathematically complex problem 
into a series of significantly easier optimization problems is common 
in optimization. Nevertheless, the existence results from Theorem 
[T] strongly motivate the use of IRLS methods to design l p filters. 
Knowing that optimal weights exist that would turn the solution 
of a weighted least squared problem into the solution of a least-p 
problem must at the very least captivate the curiosity of the reader. 
The challenge lies in finding a robust and efficient method to find 
such weights. All the methods presented in this work work under 
this basic framework, updating iteratively the weighting function of 
a least squares problem in order to find the optimal l p filter for a given 
application. Therefore it is possible to develop a suite of computer 
programs in a modular way, where with few adjustments one can 
solve a variety of problems. 

Throughout this document one can find examples of the versatility 
of the IRLS approach. One can change the internal linear objective 
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Fig. 27. Solution error l p magnitude design. 
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Fig. 28. Comparison of I2 and l p IIR magnitude designs 



function from a complex exponential kernel to a sinusoidal one to 
solve complex and linear phase FIR filters respectively using the same 
algorithm. Further adaptations can be incorporated with ease, such 
as the proposed adaptive solution to improve robustness. 

Another important design example permits to make p into a 
function of frequency to allow for different p-norms in different 
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Fig. 29. Phase responses for I2 and l p IIR magnitude designs. 



frequency bands. Such design merely requires a few changes in the 
implementation of the algorithm, yet allows for fancier, more elegant 
problems to be solved, such as the Constrained Least Squares (CLS) 
problem. In the context of FIR filters, this document presents the CLS 
problem from an l p prespective. While the work by John Adams 
1 16| set a milestone in digital filter design, this work introduces a 
strong algorithm and a different perspective to the problem from 
that by Adams and other authors. The IRLS i p -based approach from 
this work proves to be robust and flexible, allowing for even and 
uneven sampling. Furthermore, while a user can use fixed transition 
bands, one would benefit much from using a flexible transition 
band formulation, where the proposed IRLS-based algorithm literally 
finds the optimal transition band definition based on the constraint 
specifications. Such flexibility allows for tight constrains that would 
otherwise cause other algorithms to fail meeting the constraint speci- 
fications, or simply not converging at all. Section [Tl-F| introduced two 
problem formulations as well as results that illustrate the method's 
effectiveness at solving the CLS problem. 

While previous work exists in the area of FIR design (or in linear 
l p approximation for that matter), the problem of designing l p IIR 
filters has been far less explored. A natural reason for this is the 
fact that h IIR design is in itself an open research area (and a 
rather complicated problem as well). Traditional linear optimization 
approaches cannot be directly used for either of these problems, 
and nonlinear optimization tools often prove either slow or do not 
converge. 

This work presents the l p IIR design problem as a natural extension 
of the FIR counterpart, where in a modular fashion the linear 
weigthed I2 section of the algorithms is replaced by a nonlinear 
weighted I2 version. This problem formulation allows for the IIR 
implementation of virtually all the IRLS FIR methods presented in 
Chapter [n] Dealing with the weighted nonlinear I2 problem is a 
different story. 

The problem of rational h approximation has been studied for 
some time. However the sources of ideas and results related to 
this problem are scattered across several areas of study. One of 
the contributions of this work is an organized summary of efforts 
in rational I2 optimization, particularly related to the design of IIR 
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digital filters. The work in Section [III-B | also lays down a framework 
for the IIR methods proposed in this work. 

As mentioned in Section |III-C| some complications arise when 
designing IIR l p filters. Aside from the intrinsic I2 problem, it is 
necessary to properly combine a number of ideas that allowed for ro- 
bust and efficient l p FIR methods. A design algorithm for complex l p 
IIF filters were presented in Section [III-C1| this algorithm combined 
Soewito's quasilinearization with ideas such as l p homotopy, partial 
updating and the adaptive modification. In practice, the combination 
of these ideas showed to be practical and the resulting algorithm 
remained robust. It was also found that after a few p-steps, the internal 
h algorithm required from one to merely a few iterations on average, 
thus maintaining the algorithm efficient. 

One of the main contributions of this work is the introduction 
of an IRLS-based method to solve l p IIR design problems. By 
properly combining the principle of magnitude approximation via 
phase updating (from Soewito, Jackson and Kay) with the complex 
IIR algorithm one can find optimal magntiude l p designs. This work 
also introduced a sequence of steps that improve the efficiency and 
robustness of this algorithm, by dividing the design process into three 
stages and by using suitable initial guesses for each stage. 

Some of the examples in this document were designed using 
Matlab programs. It is worth to notice the common elements between 
these programs, alluding to the modularity of the implementations. 
An added benefit to this setup is that further advances in any of the 
topics covered in this work can easily be ported to most if not all of 
the algorithms. 

Digital filter design is and will remain an important topic in digital 
signal processing. It is the hope of the author to have motivated in 
the reader some curiosity for the use of l p norms as design criteria 
for applications in FIR and IIR filter design. This work is by no 
means comprehensive, and is meant to inspire the consideration of 
the flexibility of IRLS algorithms for new l p related problems. 
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